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Series introduction 


Cambridge International AS & A Level Mathematics can be a life-changing course. On the one hand, it is a 
facilitating subject: there are many university courses that either require an A Level or equivalent qualification in 
mathematics or prefer applicants who have it. On the other hand, it will help you to learn to think more precisely 
and logically, while also encouraging creativity. Doing mathematics can be like doing art: just as an artist needs to 
master her tools (use of the paintbrush, for examp!e) and understand theoretical ideas (perspective, colour wheels 
and so on), so does a mathematician (using tools such as algebra and calculus, which you will learn about in this 
course). But this is only the technical side: the joy in art comes through creativity, when the artist uses her tools 

to express ideas in novel ways. Mathematics is very similar: the tools are needed, but the deep joy in the subject 
comes through solving problems. 


You might wonder what a mathematical ‘problem’ is. This is a very good question, and many people have offered 
different answers. You might like to write down your own thoughts on this question, and reflect on how they 
change as you progress through this course. One possible idea is that a mathematical problem is a mathematical 
question that you do not immediately know how to answer. (If you do know how to answer it immediately, then 
we might call it an ‘exercise’ instead.) Such a problem will take time to answer: you may have to try different 
approaches, using different tools or ideas, on your own or with others, until you finally discover a way into it. This 
may take minutes, hours, days or weeks to achieve, and your sense of achievement may well grow with the effort it 
has taken. 


In addition to the mathematical tools that you wil! iearn in this course, the problem-solving skills that you 

will develop will also help you throughout life, whatever you end up doing. It is very common to be faced with 
problems, be it in science, engineering, mathematics, accountancy, law or beyond, and having the confidence to 
systematically work your way through them will be very useful. 


This series of Cambridge Internationai AS & A Level Mathematics coursebooks, writien for the Cambridge 
Assessment International Education syllabus for examination from 2020, wil! support you both to learn the 
mathematics required for these examinations and to develop your mathematica! problem-solving skills. The new 
examinations may well include more unfamiliar questions than in the past, and having these skills will allow you 
to approach such questions with curiosity and confidence. 


In addition to probiem solving, there are two other key concepts ihat Cambridge Assessment International 
Education have introduced in this syllabus: namely communication and mathematical modelling. These appear 
in various forms throughout the coursebooks. 


Communication in speech, writing and drawing lics at the heart of what it is to be human, and this is no less 

true in mathematics. While there is a temptation to think of mathematics as only existing in a dry, written form 
in textbooks, nothing could be further from the truth: mathematical communication comes in many forms, and 
discussing mathematical ideas with colleagues is a major part of every mathematician’s working life. As you study 
this course, you will work on many problems. Exploring them or struggling with them together with a classmate 
will help you both to develop your understanding and thinking, as well as improving your (mathematical) 
communication skills. And being able to convince someone that your reasoning is correct, initially verbally and 
then in writing, forms the heart of the mathematical skill of ‘proof’. 


Copyright Material - Review Only - Not for Redistribution 


Series introduction 


Mathematical modelling is where mathematics meets the ‘real world’. There are many situations where people need 
to make predictions or to understand what is happening in the world, and mathematics frequently provides tools 
to assist with this. Mathematicians will look at the real world situation and attempt to capture the key aspects 

of it in the form of equations, thereby building a model of reality. They will use this model to make predictions, 
and where possible test these against reality. If necessary, they will then attempt to improve the model in order 

to make beiter predictions. Examples include weather prediction and climate change modelling, forensic science 
(to understand what happened at an accident or crime scene), modelling population change in the human, animal 
and plant kingdoms, modelling aircraft and ship behaviour, modelling financial markets and many others. In this 
course, we will be developing tools which are vital for modelling many of these situations. 


To support you in your learning, these coursebooks have a variety of new features, for example: 


E Explore activities: These activities are designed to offer problems for classroom use. They require thought and 
deliberation: some introduce a new idea, others will extend your thinking, while others can support consolidation. 
The activities are often best approached by working in small groups and then sharing your ideas with each other 
and the class, as they are not generally routine in nature. This is one of the ways in which you can develop problem- 
solving skills and confidence in handling unfamiliar questions. 

E Questions labelled as p. © or @: These are questions with a particular emphasis on ‘Proof’, ‘Modelling’ or 
‘Problem solving’. They are designed to support you in preparing for the new style of examination. They may or 
may not be harder than other questions in the exercise. 

E The language of the explanatory sections makes much more use of the words ‘we’, ‘us’ and ‘our’ than in previous 
coursebooks. This language invites and encourages you to be an active participant rather than an observer, simply 
following instructions (‘you do this, then you do thai’). It is also the way that professional mathematicians usually 
write about mathematics. The new examinations may well present you with unfamiliar questions, and if you are 
used to being active in your mathematics, you will stand a better chance of being able to successfully handle such 
challenges. 


At various points in the books, there are also web links to relevant Underground Mathematics resources, 

which can be found on the free undergroundmathematics.org website. Underground Mathematics has the aim 

of producing engaging, rich materials for all students of Cambridge International AS & A Level Mathematics 
and similar qualifications. These high-quality resources have the potential to simultaneously develop your 
mathematical thinking skills and your fluency in techniques, so we do encourage you to make good use of them. 


We wish you every success as you embark on this course. 


Julian Gilbey 
London, 2018 


Past exam paper questions throughout are reproduced by permission of Cambridge Assessment International Education. 
Cambridge Assessment International Education bears no responsibility for the example answers to questions taken from its 
past question papers which are contained in this publication. 


The questions, example answers, marks awarded andlor comments thai appear in this book were written by the author(s). In 
examination, the way marks would be awarded to answers like these may be different. 
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How to usethis book 


viii 


Throughout this book you will notice particular features that are designed to help your learning. 
This section provides a brief overview of these features. 


Inthis chapter you will earn how to: 


p display numerical data in stem-and-leaf diagrams, histograms and cumulative frequency graphs Where it comes from | What you should be able todo | Check your skills 
w interpret statistical data presented in various forms IGCSE” / O Level Obtain appropriate upperand | 1 A rectangular plot measures 20m by 12m, both 
m select an appropriate method for displaying data. Mathematics lower bounds to solutions of to the nearest metre. Find: 
simple problems when given ay atleast possible perimster 
ri data to a specified accuracy. EPR p 
b the upper boundary of its area. 
IGCSE / O Level Construct and interpret 2 A histogram is drawn to represent 
i i i i j i } Mathematics histograms with equal and two classes of data. The column 
Learni ng objectives indicate the im portant unequal intervals. widths are 3cm and 4cm, and the column heights 
a P are 8cm and 6cm, respectively. What do we know 
con cepts within each cha pter and hel p you to about the frequencies of these two classes? 
navi gate throu gh the coursebook IGCSE / O Level Construct and use cumulative | 3 The heights of 50 trees are measured: 17 trees 
Mathematics frequency diagrams. arelessthin am áit a 
all of 5m. Determine, by 
drawi quency diagram, how 
| many trees have heights: 
a between 3and 4m 
& > b of 4m or more. N 
Data in a stem-and- Key point boxes contain Prerequisite knowledge exercises identify prior learning 
leaf diagram are a summary of the most that you need to have covered before starting the chapter. 
ordered in rows of important methods, facts Try the questions to identify any areas that you need to 
lal wi : ae : 
equai widths. and formulae. review before continuing with the chapter. 


WORKED EXAMPLE 5.14 


How many distinct three-digit numbers can be made from five cards, each with one 


A cumulative frequency graph KU of the digits 5, 5, 7,8 and 9 written on it? 
Answer 

Key terms are im portant terms in the to pic that you The 5 is a repeated digit, so we must investigate three situations separately. 

i ie : saelected: 7R=6 thhee-dts Ba 
are learning. They are highlighted in orange bold. No 5s selected: “P,=6 three-digit The digits 7,8 and 9 are selected and 
. As numbers. arranged. 

The glossary contains clear definitions of these key terms. iat 
One 5 selected: °C) x 3!=18 Two digits from 7,8 and 9 are 
three-digit numbers selected and arranged with a 5. 
Two 5s selected: 3C, x 2 =9 One digit from 7,8 and 9 is selected 


se-diciht 2. and arranged with two 5s. 
EXPLORE 3.5 three-digit numbers. 
= 6 +18 +9 = 33 three-digit numbers can 
P n SEa be made. 
The following table shows three students’ marks out of 20 in the same five tests. ax 


Worked examples provide step-by-step approaches to 
answering questions. The left side shows a fully worked 


Note that Buti’s marks are consistently | less than Amber’s and that Chen’s marks are solution, while the right side contains a commenta ry 
consistently 3 more than Amber’s. This is indicated in the last column of the table. explain j ng each step in the wo rki ng. 


For each student, calculate the variance and standard deviation. 


Can you explain your results, and do they apply equally to the range and 
interquartile range? 


ama" ~ 


A variable is denoted 


Explore boxes contain enrichment activities for extension i ai 
oe Sa an er-case letter H i 
work. These activities promote group-work and peer- pi ao Ss values Tip boxes contain helpful 
i i ne ie, uidance about calculatin 
to-peer discussion, and are intended to deepen your by the Sone lowercase 8 e 8 
understanding of a concept. (Answers to the Explore letter’ OL CNECANEYOYR ane Wels. 


questions are provided in the Teacher’s Resource.) 
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Later in this section, 
we will see how any 


normal variable can 
be transformed to 
the standard normal 
variable by coding. 


REWIND , 


Recall from Chapter 4, 
Section 4.i that 
P(A) = 1- P(4’). 


Rewind and Fast forward boxes direct you to related 
learning. Rewind boxes refer to earlier learning, in case you 
need to revise a topic. Fast forward boxes refer to topics 
that you will cover at a later stage, in case you would like to 
extend your study. 


| ) DID You KNOW? 


It has long been common practice to write Q.E.D. at the point where a mathematical proof 
or philosophical argument is complete. Q.E.D. is an initialism of the Latin phrase quad erat 
demonstrandum, meaning ‘which is what had to be shown’. 


Latin was used as the language of international communication, scholarship and science until well 
into the 18th century. 


Q.E.D. does not stand for Quite Easily Done! 


A popular modern alternative is to write W, an abbreviation of Which Was What Was Wanted. 3| 


Did you know? boxes contain interesting facts showing 
how Mathematics relates to the wider world. 


@ Commonly used measures of variation are the range, interquartile range and standard deviation. 


Checklist of learning and unde 


@ A box-and-whisker diagram shows the smailest and largest values, the lower and upper quartiles 


and the median of a set of data. \ 


At the end of each chapter there is a Checklist of 
learning and understanding and an End-of-chapter 
review exercise. 


The checklist contains a summary of the concepts that 
were covered in the chapter. You can use this to quickly 
check that you have covered the main topics. 


Cross-topic review exercises appear after several 
chapters, and cover topics from across the preceding 
chapters. 


How to use this book 


~ 


Extension material 
goes beyond 

the syllabus. It is 
highlighted by a 

red line to the left of 
the text. 


Throughout each chapter there are multiple exercises 
containing practice questions. The questions are coded: 


These questions focus on problem-solving. 
These questions focus on proofs. 


These questions focus on modelling. 


You can use a calculator for these questions. 


These questions are taken from past 
examination papers. 


© 
Q 
(m) 
E You should not use a calculator for these questions. 
© 


END-OF-CHAPTER REVIEW EXERCISE 8 


1 A continuous random variable, X, has a normal distribution with mean 8 and standard deviation ø. 
Given that P(X > 5) = 0.9772, find P(X < 9.5). [3] 


2 The variable Y is normally distributed. Given that 100 = 34 and P(Y < 10) = 0.75, find P(Y = 6) [4] 


3 In Scotland, in November, on average 80% of days are cloudy. Assume that the weather on any one day is 


independent of the weather on other days. 


The End-of-chapter review contains exam-style 
questions covering all topics in the chapter. You can 
use this to check your understanding of the topics you 
have covered. The number of marks gives an indication 
of how long you should be spending on the question. 
You should spend more time on questions with higher 
mark allocations; questions with only one or two marks 
should not need you to spend time doing complicated 
calculations or writing long explanations. 


CROSS-TOPIC REVIEW EXERCISE 2 


1 Each of the eight players in a chess team plays 12 games against opponents from other teams. The tot 
wins, draws and losses for the whole team are denoted by X,Y and Z, respectively. 


a State the value of ¥ +Y +Z. 


= 
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In this chapter you will ican how to: 
display numerical data in stem-and-leaf diagrams, histograms and cumulative frequency graphs 
interpret statistica! data presented in various forms 

select an appropriate method for displaying data. 
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PREREQUISITE mele GES 


Where it comes from | What you should be able to do Check your skills 


IGCSE®/ O Level Obtain appropriate upper and | 1 A rectangular plot measures 20m by 12m, both 
Mathematics: lower bounds to solutions of | to the nearest metre. Find: 

sme proves wuen giyen a its least possible perimeter 
data to a specified accuracy. 


b the upper boundary of its area. 


q IGCSE / O Level Construct and interpret A histogram is drawn to represent two classes of 

Mathematics histograms with equal and data. The column widths are 3cm and 4cm, 
unequal intervals. and the column heights are 8cm and 6cm, 
respectively. What do we know about the 
frequencies of these two classes? 


IGCSE / O Level Construct and use cumulative The heights of 59 trees are measured: 17 trees 


Mathematics frequency diagrams. are less than 3m; 44 trees are less than 4m; and 
| all of the trees are less than 5m. Determine, by 
drawing a cumulative frequency diagram, how 

many trees have heights: 


a between 3 and 4m 


b of 4m or more. 


Why do we collect, display and analyse data? 

We can collect data by gathering and counting, taking surveys, giving out questionnaires 
or by taking measurements. We display and analyse data so that we can describe the 
things, both physical and social, that we see and experience around us. We can also find 
answers to questions that might not be immediately obvious, and we can also identify 
questions for further investigation. 


Improving our data-handling skills will allow us to better understand and evaiuate 

the large amounts of statistical information that we meet daily. We find it in the media 
and from elsewhere: sports news, product advertisements, weather undates, health and 
environmental reports, service information, political campaigning, stock market reports 
and forecasts, and so on. 


Through activities that involve data handling, we naturally begin to formulate questions. 
This is a valuable skill that helps us to make informed decisions. We also acquire skills that 
enable us to recognise some of the inaccurate ways in which data can be represented and 
analysed, and to develop the ability to evaluate the validity of someone else’s research. 


1.1 Types of data 

There are two types of data: qualitative (or categorical) data are described by words and 
are non-numerical, such as blood types or colours. Quantitative data take numerical values 
and are either discrete or continuous. As a general rule, discrete data are counted and 
cannot be made more precise, whereas continuous data are measurements that are given to 
a chosen degree of accuracy. 


Discrete data can take only certain values, as shown in the diagram. 


< range —"———> 
CO Oe eee eee eee — 
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Chapter 1: Representation of data 


The number of letters in the words of a book is an example of discrete quantitative data. 
Each word has | or 2 or 3 or 4 or... letters. There are no words with 34 or 4.75 letters. 


Discrete quantitative data can take non-integer values. For example, United States coins 
have dollar values of 9.01, 0.05, 0.10, 0.25, 0.50 and 1.00. In Canada, the United Kingdom 
and other countries, shoe sizes such as 64, 7 and 74 are used. 

Continuous data can take any value (possibly within a limited range), as shown in the diagram. 
< range —— 
—r> o 
The times taken by the athletes to complete a 100-metre race is an example of continuous 
quantitative data. We can measure these to ths nearest second, tenth of a second or even 


more accurately if we have the necessary equipment. The range of times is limited to 
positive real numbers. 


© KEY POINT 1.1 


Discrete data can take only certain values. 
Continuous data can take any value, possibly within a limited range. 


1.2 Representation of discrete data: stem-and-leaf diagrams 

A stem-and-leaf diagram is a type of table best suited to representing small amounts of 
discrete data. The last digit of each data value appears as a leaf attached to all the other 
digits, which appear in a stem. The digits in the stem are ordered vertically, and the digits 
on the leaves are ordered horizontally, with the smallest digit placed nearest to the stem. 


Each row in the table forms a class of values. The rows should have intervals of equal width 
to allow for easy visual comparison of sets of data. A key with the appropriate unit must be 
included to explain what the values in the diagram represent. 


Stem-and-leaf diagrams are particularly useful because raw data can still be seen, and two sets 
of related data can be shown back-to-back for the purpose of making comparisons. 


Consider the raw percentage scores of 15 students in a Physics exam, given in the following 
list: 58, 55, 58, 61, 72, 79, 97, 67, 61, 77, 92, 64, 69, 62 and 53. 


To present the data in a stem-and-leaf diagram, we first group the scores into suitable 
equal-width classes. 


Class widths of 10 are suitable here, as shown below. 


5/8583 . 
6171497 The diagram should 
7|297 have a bar chart- 
; 72 like shape, which is 
achieved by aligning 
Next, we arrange the scores in each row in ascending order from left to right and add a key the leaves in columns. 


It is advisable to 
redraw the diagram if 


to produce the stem-and-leaf diagram shown below. 


5|3588 Key: 5 |3 any errors are noticed, 
6112479 represents ri iole itin 
7|279 a score of 53% rar ele 

8 pencil, so that accuracy 
9/27 can be maintained. 


In a back-to-back stem-and-leaf diagram, the leaves to the right of the stem ascend left to right, 
and the leaves on the left of the stem ascend right to left (as shown in Worked example 1.1). 
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WORKED EXAMPLE 1.1 


If rows of leaves are 
particularly long, 
repeated values may be 
used in the stem (but 
Jan: 17 Feb: 20 Mar: 13 Apr: 12 May: 10 Jun: 8 this is not necessary in 
= Worked example 1.1). 
Jul: 0 Aug: 1 Sep: 5 Oct: 11 Nov: 16 Dec: 9 However, if there 
were, say, 30 leaves 

in one of the rows, 

we might consider 


The number of days on which rain fell in a certain town in each month of 2016 
and 2017 are given. 


Jan: 9 Feb: 13 Mar: 11 


Jul: 1 Aug: 2 Sep: 2 grouping the data into 
. . . , narrower classes of 
Display the data in a back-to-back stem-and-leaf diagram and briefly compare 0—4, 5-9, 10-14, 15-19 
the rainfall in 2016 with the rainfal! in 2017. and 20-24. This would 
require 0,0, 1,1 and 2 
Answer in the stem. 
2016 | |2017 Key: 5| 0 | 6 We group the values for 
98510/0/1223467889 represents 5 days ina the months of each year 
763210)1)13 month of 2016 and 6 into classes 9- 9, 10-19 and 
li days ina month of 2017 GOON EE Data in a stem-and- 
the valucs in each class in leaf diagram are 
order with a key, as shown. ordered in rows of 
equal widths. 
It rained on more days in 2016 (122 days) No information is given 
| than it did in 2017 (74 days). about the amount of rain 


that fell, so it would be a 
| mistake to say that more 

rain fell in 2016 than 

in 2017. 


EXERCISE 1A 


1 Twenty people leaving a cinema are each asked, “How many times have you attended the cinema in the past 
year?” Their responses are: 


6, 2, 13, 1, 4, 8, 11, 3, 4, 16, 7, 20, 13, 5, 15, 3, 12, 9, 26 and 10. 


Construct a stem-and-leaf diagram for these data and include a key. 


2 A shopkeeper takes 12 bags of coins to the bank. The bags contain the following numbers of coins: 
150, 163, 158, 165, 172, 152, 160, 170, 156, 162, 159 and 175. 
a Represent this information in a stem-and-leaf diagram. 


b Each bag contains coins of the same value, and the shopkeeper has at least one bag containing coins with 
dollar values of 0.10, 0.25, 0.50 and 1.00 only. 


What is the greatest possible value of all the coins in the 12 bags? 


3 This stem-and-ieaf diagram shows the number of employees at 20 companies. 1]/0888899 Key:1|0 
2/056677 8 9 represents 10 
3/01129 


a What is the most common number of employees? employees 


b How many of the companies have fewer than 25 employees? 
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c What percentage of the companies have more than 30 employees? 


d Determine which of the three rows in the stem-and-leaf diagram contains the smallest number of: 


i companies ii employees. 

4 Over al4-day period, data were collected on the number of Ferry A (14) |_| FerryZ (14) Key:3 |5 |0 
passengers travelling on two ferries, A and Z. The results are 87612 represents 53 
presented to the right. 7640/3/058 passengers on A 

8653]4]345777 and 50 passengers 
a How many more passengers travelled on ferry Z than on 53315102669 onz 


ferry A? 


b The cost of a trip on ferry A is $12.50 and the cost of a trip on ferry Z is $x. The takings on ferry Z were 
$3.30 less than the takings on ferry A over this period. Find the value of x. 


c Find the least and greatest possible number of days on which the two ferries could have carried exactly the 
same number of passengers. 
5 Theruns scored by two batsmen in 15 cricket matches last season were: 
Batsman P: 53, 41, 57, 38, 41, 37, 59, 48, 52, 39, 47, 36, 37, 44, 59. 
Batsman Q: 56, 48, 31, 64, 21, 52, 45, 36, 57, 68, 77, 20, 42, 51, 71. 
a Show the data in a diagram that allows easy comparison of the two performances. 
b Giving a reason for your answer, decide which of the batsmen performed: 


i better ii more consistently. 


6 The total numbers of eggs laid in the nests of two species of bird were recorded over several breeding 
seasons. 


The numbers of eggs laid in the nests of 10 wrens and 10 dunnocks are: 

Wrens: 22, 18, 21, 23, 17, 23, 20, 19, 24, 13. 

Dunnocks: 28, 24, 23, 19, 30, 27, 22, 25, 22, 17. 

a Represent the data in a back-to-back stem-and-leaf diagram with rows of width 5. 


b Given that ali of these eggs hatched and that the survival rate for dunnock chicks is 92%, estimate the 
number of dunnock chicks that survived. 


c Find the survival rate for the wren chicks, given that 14 did not survive. 


(Ps) 7 This back-to-back stem-and-leaf diagram shows the percentage scores of the 25 students who were the top 
performers in an examination. 


Girls (12)| ļBoys(13) Key:1|8]2 
represents 
81% fora 

3344 girland 82% 

69 for a boy 


8 
8 
9 
9 


The 25 students are arranged in a line in the order of their scores. Describe the student in the 
middle of the line and find the greatest possible number of boys in the line who are not standing 
next to a girl. 


Copyright Material - Review Only - Not for Redistribution 


X 


N 
Cambridge International AS & A Levelgfathematics: Probability & Statistics 1 


1.3 Representation of continuous data: histograms 
Continuous data are given to a certain degree of accuracy, such as 3 significant figures, 
2 decimal places, to the nearest 10 and so on. We usually refer to this as rounding. 


When values are rounded, gaps appear between classes of values and this can lead to a 
misunderstanding of continuous data because those gaps do not exist. 


Consider heights to the nearest centimetre, given as 146-150, 151-155 and 156-160. 
Gaps of !cm appear between classes because the values are rounded. 


Using /: for height, the actual classes are 145.5 < A < 150.5, 150.5 < h < 155.5 and 
155.5 < h < 160.5cm. 


The classes are shown in the diagram below, with the lower and upper boundary values and 
the class mid-values (also called midpoints) indicated. 


Given as 156-160 


Given as 146-150 Given as 151-155 
—— | 


145.5 148 150.5 153 155.5 158 160.5 


Lower class boundaries are 145.5, 150.5 and 155.5cm. 
Upper class boundaries are 150.5, 155.5 and 160.5cm. 


Class widths are 150.5- 145.5 =5,155.5-150.5=5 and 160.5—155.5=5. 

died nid E N ee ae pis ina 2e =158. 

A histogram is best suited to illustrating continuous data but it can also be used to 
illustrate discrete data. We might have to group the data ourselves or it may be given to us 
in a grouped frequency table, such as those presented in the tables below, which show the 


ages and the percentage scores of 100 students who took an examination. 


16s 4<18 18 < A<20 20 =A <22 : 
‘No.’ is the 
34 46 20 abbreviation used 
7 for ‘Number of” 
throughout this book. 
10-29 30-59 60-79 80-99 
6 21 60 13 


The first table shows three classes of continuous data; there are no gaps between the classes 

and the ciasses have equal-width intervals of 2 years. This means that we can represent 

the daia in a frequency diagram by drawing three equal-width columns with column 

heights equal to the class frequencies, as shown beiow. Q 


We concertina part of 
an axis to show that 
a range of values has 


been omitted. 


Frequency 


19 20 
Age (years) 
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The following table shows the areas of the columns and the frequency of each of the three 
classes presented in the diagram on the previous page. 


2 x 34= 68 2 x 46 = 92 2 x 20 =40 
34 46 20 


From this table we can see that the ratio of the column areas, 68 : 92 : 40, is exactly the same 
as the ratio of the frequencies, 34: 46: 20. 


In a histogram, the area of a column represents the frequency of the corresponding class, Q 
so that the area must be proportional to the frequency. 


; The symbol oc means 
We may see this written as ‘area œ frequency’. ‘is proportional to’. 
This also means that in every histogram, just as in the example above, the ratio of column 
areas is the same as the ratio of the frequencies, even if the classes do not have equal 


widths. 


Also, there can be no gaps between the columns in a histogram because the upper boundary 
of one class is equai to the lower boundary of the neighbouring class. A gap can appear only 
when a class has zero frequency. 


The axis showing the measurements is labelled as a continuous number line, and the width 
of each coiumn is equal to the width of the class that it represents. 


When we construct a histogram, since the classes may not have equal widths, the height 
of each column is no longer determined by the frequency alone, but must be calculated so 
that area œ frequency. 


The vertical axis of the histogram is labeiled frequency density, which measures frequency 
per standard interval. The simplest and most commonly used standard interval is 1 unit of 
measurement. 


For example, a column representing 85 objects with masses from 50 to 60 kg has 


a frequency density of ene = 8.5 objects per kilogram or 0.0085 objects 


per gram and so on. 


class frequenc n 
For a standard interval of 1 unit of measurement, Freouency density ee which can be 


i : class width 
_ rearranged to give 


Class frequency = class width x frequency density 


In a histogram, we can see the relative frequencies of classes by comparing column areas, 
and we can make estimates by assuming that the values in each class are spread evenly over 
the whole class interval. 
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WORKED EXAMPLE 1.2 


The masses, mkg, of i190 children are grouped into two classes, as shown in the table. 


40<m <50 50<m<70 


O 40 60 


a Illustrate the data in a histogram. 


b Estimate the number of children with masses between 45 and 63kg. 


| Answer 
S 40<m<50 | 50&m<70 Frequency density 
is calculated for the 
40 60 unequal-width intervals 
10 20 in the table. The masses 


< are represented in 
oy | eres the histogram, where 
frequency density 
measures number of 
children per 1kg or 
simply children per kg. 


Column areas are equal 
to class frequencies. 
For example, the area 
of the first column is 


4 chil 
(50-40)kg x ~~ en 
lkg 


= 40 children. 


Frequency density 


If we drew column 
heights of 8 and 6 
instead of 4 and 3, then 
frequency density would 


Mass (m kg) 


b There are children with masses from 45 to 63kg in both classes, so we must 
split this interval into two parts: 45—50 and 50-63. 


measure children per 


1 x 40 = 20 children È D class 40-50 kg has 2 kg. The area of the 
p a mid-value of 45kg 
Py: f 40 first column would be 
and a frequency o f g children 


50—40)k 
(50-40)kg x Oke 


= 40 children. 


Frequency = width x frequency densit 
i . 3 , : Our estimate for the 


| sir i interval 50-63kg 
Ikg is equal to the area 
=13 x 3 children corresponding to 
= 39 children this section of the 


second column. 


Our estimate is 20 + 39 = 59 children. We add together the 
estimates for the two 


intervals. 
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Consider the times taken, to the nearest minute, for 36 athletes to complete a race, as given 
in the table below. 


Use class boundaries 


3 14-15 (rather than rounded 
T 16-18 values) to find class 
4 14 18 widths, otherwise 


incorrect frequency 


: : densities will be obtained. 
Gaps of | minute appear between classes because the times are rounded. 


Frequency densities are calculated in the following table. 


12.5<1<13.5 13.5<1<15.5 15.5<1<18.5 The class with the 
highest frequency does 


4 14 18 not necessarily have the 
1 2 3 highest frequency density. 
4+1=4 14+2=7 18+3=6 


This histogram represents the race times, where frequency density measures athletes per minute. ; 
Do think carefully about 


the scales you use when 
constructing a histogram 
or any other type of 
diagram. Sensible 
scales, such as 1cm for 
1, 5, 10, 20 or 50units, 
allow you to read values 
with much greater 
accuracy than scales 
such as lem for 3,7 
or 23 units. For similar 
reasons, try to use as 
much of the sheet of 
125 135 15.5 18.5 graph paper as possible, 
Time taken (¢ min) ensuring that the whole 
diagram will fit before 
you start to draw it. 


lon 


Frequency density 
A 


Use the histogram of race times shown previously to estimate: 


a the number of athletes who took less than 13.0 minutes 


b the number of athletes who took between 14.5 and 17.5 minutes 


c the time taken to run the race by the slowest three athletes. 


b 38+2=19 athletes  .cccc 


c Between 18.0 and «cececece 
18.5 minutes. 


Copyright Material - Review Only - Not for Redistribution 


N ) 
Cambridge International AS & A Leveljfathematics: Probability & Statistics 1 


Refer back to the tabie in Section 1.3 that shows the percentage scores of 100 students 


a te/. It is not acceptable 
who took an examination. 


to draw the axes or 


Discuss what adjustments must be made so that the data can be represented in a histogram. the columns of a 
r . : . . histogram freehand. 
How could we make these adjustments and is there more than one way of doing this? Always use a ruler! 


RCISE 1B 
D = 


1 Ina particular city there are 51 buildings of historical interest. The following table presents the ages of these 
buildings, given to the nearest 50 years. 


NJI 50-150 200-300 350-450 500-600 
A 15 18 | 12 6 


a Write down the lower and upper boundary values of the class containing the greatest number of buildings. 


b State the widths of the four class intervals. 


a 


Illustrate the data in a histogram. 


d Estimate the number of buildings that are between 250 and 400 years old. 


| 2 The masses, mgrams, of 690 medical samples are given in the following table. 
10 


dem<12 12<m<24 24=m<28 
224 396 p 


a Find the value of p that appears in the table. 


b On graph paper, draw a histogram to represent the data. 


c Calculate an estimate of the number of samples with masses between § and 18 grams. 


3 The table below shows the heights, in metres, of 50 boys and of 50 giris. 


1.2- 1,3= 1.6— 1.8-1.9 
7 11 26 6 
10 22 16 2 


a How many children are between 1.3 and 1.6 metres tall? 
b Draw a histogram to represent the heighis of all the boys and girls together. 


c Estimate the number of children whose heights are 1.7 metres or more. 


4 The heights of 600 saplings are shown in the following table. 


0- 5- 15— 30-u 
64 232 240 64 


a Suggest a suitable value for u, the upper boundary of the data. 


b Illustrate the data in a histogram. 
c Calculate an estimate of the number of saplings with heights that are: 


i less than 25cm ii between 7.5 and 19.5cm. 
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5 Each of the 70 trainees at a secretarial college was asked to type a copy of a particular document. The times 
taken are shown, correct to the nearest 0.1 minutes, in the following table. 


ŢȚ 
AN 2.6-2.8 2.9-3.0 3.1-3.2 39237 
tr s 15 25 20 10 


Explain why the interval for the first class has a width of 0.3 minutes. 


o 9v 


Represent the times taken in a histogram. 
c Estimate, to the nearest second, the upper boundary of the times taken by the fastest 10 typists. 
d Itis given that 15 trainees took between 3.15 and b minutes. Calculate an estimate for the value of b when: 


i b>3.15 ii 6 < 3.15, 


6 A railway line monitored 15% of its August train journeys to find their departure delay times. The results are 
shown below. It is given that 24 of these journeys were delayed by less than 2 minutes. 


N 
© 


— 
eN 


co 


Frequency density 
P 


0 2 4 12 18 20 
Time (min) 


a How many journeys were monitored? 

b Calculate an estimate of the number of these journeys that were delayed by: 
i 1to3 minutes ii 10 to15 minutes. 

c Show that a total of 2160 journeys were provided in August. 


d Calculate an estimate of the number of August journeys that were delayed by 3 to 7 minutes. State any 
assumptions that you make in your calculations. 


7 A university investigated how much space on its computers’ hard drives is used for data storage. The results 
are shown below. It is given that 40 hard drives use less than 20GB for data storage. 


D 


Ww 


N 


Frequency density 


— 


Ww 20 60 120 200 
Storage space (GB) 


a Find the tota! number of hard drives represented. 
b Calculate an estimate of the number of hard drives that use less than 50GB. 


c Estimate the value of k, if 25% of the hard drives use k GB or more. 
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8 The lengths of the 575 items in a candle maker’s 
workshop are represented in the histogram. 


a What proportion of the items are less than 
25cm Jong? 


b Estimate the number of items that are between 
12.4 and 36.8cm long. 


Frequency density 


c The shortest 20% of the workshop’s items 
are to be recycled. Calculate an estimate of 


the length of the shortest item that will not be 19 au 2 a = 
recycled. Length (/ cm) 
9 The thicknesses, kmm, of some steel sheets are 
represented in the histogram. It is given that 7 
k< 0.4 for 180 sheets. 3 
v 
© 
a Find the ratio between the frequencies 3 
of the three classes. Give your answer in 3 
SP 3 
simplified form. É 
b Find the value of n, given that frequency 
density measures sheets per nmm. 0 0.2 0.4 0.6 0.8 L0 1.2 


c Calculate an estimate of the number of sheets Thickness (k mm) 


for which: 


i k<0.5 ii 0.75 < k< 0.94 


d The sheets are classified as thin, medium or thick in the ratio 1:3:1. 


Estimate the thickness of a medium sheet, giving your answer in the form a < k <b. How accurate are 
your values for a and b? 


(Ps) 10 The masses, in kilograms, of the animals treated at a veterinary clinic in the past year are illustrated in a 
histogram. The histogram has four columns of equal height. The following table shows the class intervals and 
the number of animals in two of the classes. 


3-5 6-12 13-32 33—44 
a 371 1060 b 


a Find the value of a and of b, and show that a total of 2226 animals were treated at the clinic. 
b Calculate an estimate of the lower boundary of the masses of the heaviest 50% of these animals. 

(Ps) 11 The minimum daily temperature at a mountain village was recorded to the nearest 0.5°C on 200 consecutive 
days. The results are grouped into a frequency table and a histogram is drawn. 
The temperatures ranged from 0.5°C io 2°C on n days, and this class is represented by a column of height cm. 
The temperatures ranged from -2.5 to -0.5°C on d days. Find, in terms of n, h and d, the height of the 
column that represents these temperatures. 

(Ps) 12 The frequency densities of the four classes in a histogram are in the ratio 4:3:2:1. The frequencies of these 
classes are in the ratio 10:15:24:8. 


Find the total width of the histogram, given that the narrowest class interval is represented by a column of 
width 3cm. 
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p-50 51-70 71-80 


8l-q 


165 240 195 


147 


Given that the frequency densities of the four classes of percentage scores are in the ratio 5:8:13:7, find the 


value of p and of q. 


Bar charts first appeared in a book by the Scottish political economist William Playfair, entitled The 
Commercial and Political Atlas (London, 1786). His invention was adopted by many in the following 
years, including Florence Nightingale, who used bar charts in 1859 to compare mortality in the 
peacetime army with the mortality of civilians. This helped to convince the government to improve 
army hygiene. 


In the past few decades histograms have played a very important role in image processing and 

computer vision. An image histogram acts as a graphical representation of the distribution of colour 
tones in a digital image. By making adjustments to the histogram, an image can be greatly enhanced. 
This has had great benefits in medicine, where scanned images are used to diagnose injury and illness. 


1.4 Representation of continuous data: cumulative frequency graphs 


A cumulative frequency graph can be used to represent continuous data. Cumulative 
frequency is the total frequency of all values less than a given value. 


If we are given grouped data, we can construct the cumulative frequency diagram by 
plotting cumulative frequencies (abbreviated io cf) against upper class boundaries 
for all intervals. We can join the points consecutively with straight-line segments to 
give a cumulative frequency polygon or with a smooth curve to give a cumulative 
frequency curve. 


For example, a set of data that includes 100 values below 7.5 and 200 values beiow 9.5 will 
have two of its points plotted at (7.5, 100) and at (9.5, 200). 


We plot points at upper boundaries because we know the total frequencies up to these 
points are precise. 


From a cumulative frequency graph we can estimate the number or proportion of values 
that lie above or below a given value, or between two values. There is no rule for deciding 
whether a polygon or curve is the best type of graph to draw. It is often difficult to fit a 
smooth curve through a set of plotted points. Also, it is unlikely that any two people will 
draw exactly the same curve. A polygon, however, is not subject to this uncertainty, as we 
know exactly where the line segments must be drawn. 
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In Chapter 2, 

Section 2.3 and in 
Chapter 8, Section 8.1, 
we will see how a 
histogram or bar chart 
can be used to show the 
shape of a set of data, 
and how that shape 
provides information 
on average values. 


A common mistake 
is to plot points at 
class mid-values but 


the total frequency up 
to each mid-value is 
not precise — it is an 
estimate. 
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By drawing a cumulative frequency polygon, we are making exactly the same assumptions 
that we made when we used a histogram to calculate estimates. This means that estimates 
from a polygon should match exactly with estimates from a histogram. 


WORKED EXA 


The following table shows the lengths of 80 leaves from a particular tree, given to 
the nearest centimetre. 


12 3—4 5-7 8-9 10-11 


Draw a cumulative frequency curve and a cumulative frequency polygon. Use 
each of these to estimate: 


a the number of leaves that are less than 3.7 cm long 


b the lower boundary of the lengths of the longest 22 leaves. 


Answer 


Before the diagrams 
fon can be drawn, we must 
feb 0 0 organise the given 
dati to show upper 


\25 0+8 8 ciass boundaries and 
1<4.5 0+8+20 28 cumulative frequencies. 


LEP 0+8+20+38 66 
F295 0+8+20+38+10 76 76 can also be calculated as 
l] 


[<11.5 |0+8+20+38+10+4 80 66 + 10, using the previous 
cf value. 


We ploi 
points at 
(0.5,0), 
(2.5.8). 
(4.5,28), 
(7.5,66), 
(9.5.76) 
and 
(11.5, 80). 
We then 
join them 
in order 
with ruled 
lines and 


also with 
a The polygon gives an estimate of 20 leaves. a smooth 
curve, to 
give the 
b The polygon gives an estimate of 6.9cm. two types 
of graph. 


Cumulative frequency (No. leaves) 


Length (/ cm) 


The curve gives an estimate of 18 leaves. 


The curve gives an estimate of 6.7cm. 
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In Worked example 
1.2, we made estimates 
from a histogram by 
assuming that the 
values in each class are 
spread evenly over the 
whole class interval. 


* Do include the lowest 
class boundary, which 
has a cumulative 
frequency of 0, and 
plot this point on 
your graph. 


The dotted lines show 
the workings for parts 
a and b. 


When constructing 
a cumulative 
frequency graph, 
you are advised to 
use sensible scales 


that allow you to 
plot and read values 
accurately. 


Estimates from a 

polygon and a curve 
will not be the same, 
as they coincide only 
at the plotted points. 


Note that all of 
these answers are 
only estimates, as 

we do not know the 
exact shape of the 
cumulative frequency 
graph between the 
plotted points. 


Chapter 1: Representation of data 


Four histograms (i--4) that represent four different sets of data with equal-width 
intervals are shown. 


fd 


tn. NM 


fad 


fl, Ll 


a Which of the following cumulative frequency graphs (A-D) could represent 
ihe same set of data as each of the histograms (!—4) above? 


cava 


How do column heights affect the shape of a cumulative frequency graph? 


We will use cumulative 
frequency graphs 

to estimate the 
median, quartiles and 
percentiles of a set 

of data in Chapter 2, 
Section 2.3 and in 
Chapter 3, Section 3.2. 


0 


b Why could a cumulative frequency graph never look like the sketch shown 
below? 
of 


You can investigate the 
relationship between 
histograms and 
cumulative frequency 
graphs by visiting the 
Cumulative Frequency 
Properties resource on 
the Geogebra website 
(www.geogebra.org). 
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EXERCISE 1C 


1 The reaction times, t seconds, of 66 participants were measured in an experiment and presented below. 


t<1.5 0 
t<3.0 3 
t<4.5 8 
t<6.5 32 
t<8.5 54 
t<11.0 62 
t<13.0 66 


a Draw a cumulative frequency polygon to represent the data. 
b Use your graph to estimate: 
i the number of participants with reaction times between 5.5 and 7.5 seconds 


ii the lower boundary of the slowest 20 reaction times. 


2 The following table shows the widths of the 70 books in one section of a library, given to the nearest 
centimetre. 


i d 
10-14 15-19 20-29 30-39 40-44 
3 13 25 24 5 


a Given that the upper boundary of the first class is 14.5cm, write down the upper boundary of the second class. 


b Draw up acumulative frequency table for the data and construct a cumuiative frequency graph. 
c Use your graph to estimate: 

i the number of books that have widths of less than 27cm 

ii the widths of the widest 20 books. 


3 Measurements of the distances, xmm, between two moving parts inside car engines were recorded and are 
summarised in the following table. There were 156 engines of type A and 156 engines of type B. 


X x<0.10 x<0.35 x<0.60 x<0.85 x<1.20 
0 16 84 134 156 
0 8 52 120 156 


a Draw and label two cumulative frequency curves on the same axes. 

b Use your graphs to estimate: 
i the number of engines of each type with measurements between 0.30 and 0.70 mm 
ii the total number of engines with measurements that were less than 0.55 mm. 


c Both types of engine must be repaired if the distance between these moving parts is more than a certain 
fixed amount. Given that 16 type A engines need repairing, estimate the number of type B engines that 
need repairing. 
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4 The diameters, dcm, of 60 cylindrical electronic components are represented in the following cumulative 
frequency graph. 


| 
60 HHH SEEEEEEEEEEEE EEE HEHEHE 


40 


Cumulative frequency 
(No. components) 


20 av | Late] ae i i il i eel ea bea il Late 


0.1 0.2 0.3 0.4 0.5 0.6 


Diameter (d cm) 


a Find the number of components such that 0.2 < d < 0.4, and explain why your answer is not an estimate. 
b Estimate the number of components that have: 

i adiameter of less than 0.15cm ii a radius of 0.16cm or more. 
c Estimate the value of k, given that 20% of the components have diameters of Amm or more. 


d Give the reason why 0.1—0.2cm is the modal class. 


5 The following cumulative frequency graph shows the masses, m grams, of 152 uncut diamonds. 


Ts LP TT ] 
bla | | | Ze i E Py TE 
Lit x 
160 4 - Corer rt a 
Co co Chol a Py 
En |_| | | | | | P| ei e | 
S — H HHH 
3 N 
5 1 BE LE] l CEECEE LEI 
5 2 roii 
aaa F EE P EE 
[e] | | {| | CA Pe E A BE 
Z HAHH 
> F HEE EHEHE 
z Es sasssa HHHH 
So 80 FHH | AH | a 
a BE [a 4 | a EE 
o | 
é E 
5 40 | co 
O BENEN CO CECE Co 
FP HHHH H 
A EE E EE A N 
AHHH HAH 
0 
4 8 12 16 20 24 


Mass (m grams) 
a Estimate the number of uncut diamonds with masses such that: 
i 9<m<17 ii 7.2<m<15.6. 


b The lightest 40 diamonds are classified as small. The heaviest 40 diamonds are classified as large. Estimate 
the difference between the mass of the heaviest sma!I diamond and the lightest large diamond. 
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c The point marked ai P(24,152) on the graph indicates that the 152 uncut diamonds all have masses of less 
than 24 grams. 


Each diamond is now cut into two parts of equal mass. Assuming that there is no wastage of material, 
write down the coordinates of the point corresponding to P on a cumulative frequency graph representing 
the masses of these cut diamonds. 


The densities, d g/cm, of 125 chemical compounds are given in the following table. 


= 
d <1.30 d <1.38 d<1.42 d<1.58 d<1.70 
32 77 92 107 125 


Find the frequencies a, b, c and d given in the table below. 


0- 
a 


1.42-1.70 


The daily journey times for 80 bank staff to get to work are given in the following table. 


` t<10 t<15 t<20 | t<25 t<30 t<45 t<60 
D 3 11 24 | 56 68 76 80 


a How many staff take between 15 and 45 minutes to get to work? 


y 


: x+y. . 
b Find the exact number of staff who take ~ —- minutes or more to get to work, given that 85% of the staff 


a 


take less than x minutes and that 70% of the staff take y minutes or more. 


A fashion company selected 100 12-year-old boys and 100 12-year-old girls to audition as models. The heights, 
hcm, of the selected children are represented in the following graph. 


5 

g 

3 

Si Boys 
G= 

o 
2 : 
3 Girls 
3 

g 

3 

O 


0 140 150 160 170 180 
Height (A cm) 


a What features of the data suggest that the children were not selected at random? 


b Estimate the number of girls who are taller than the shortest 50 boys. 


a 


What is the significance of the value of h where the graphs intersect? 


a 


The shortest 75 boys and tallest 75 girls were recalled for a second audition. On a cumulative frequency 
graph, show the heights of the children who were not recalled. 
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9 The following table shows the ages of the students currently at a university, given by percentage. Ages are 
rounded down to the number of whole years. 


20-21 22-24 25-28 29-35 36-44 


51 11 5 4 2 


a Represent the data in a percentage cumulative frequency polygon. 


b The oldest 8% of these students qualify as ‘mature’. Use your polygon to estimate the minimum age 
requirement for a student to be considered mature. Give your answer to the nearest month. 


c Of the 324 students who are 18-19 years old, 54 are not expected to find employment within 3 months of 
finishing their course. 


i Calculate an estimate of the number of current students who are expected to find employment within 
3 months of finishing their course. 


ii What assumptions must be made to justify your calculations in part ce i? Are these assumptions 
reasonable? Do you expect your estimate to be an overestimate or an underestimate? 


Ps) 10 The distances, 1n km, that 80 new cars can travel on | litre of fuel are shown in the table. 


4.4— 6.6- 8.8- 12.1- 15.4—18.7 
5 7 52 12 4 


These distances are 10% greater than the distances the cars will be able to travel after they have covered more 
than 100000km. 


Estimate how many of the cars can travel 10.5km or more on 1 litre of fuel when new, but not after they have 
covered more than 100000 km. 


OS 1 


H 


A small company produces cylindrical wooden pegs for making garden chairs. The lengths and diameters of 
the 242 pegs produced yesterday have been measured independently by two employees, and their results are 
given in the following table. 


d<1.5 d < 2.0 d<2.5 d<3.0 


60 182 222 242 


a On the same axes, draw two cumulative frequency graphs: one for lengths and one for diameters. 


b Correct to the nearest millimetre, the lengths and diameters of n of these pegs are equal. Find the least 
and greatest possible value of n. 


c A peg is acceptable for use when it satisfies both / = 2.8 and d < 2.2. 


Explain why you cannot obtain from your graphs an accurate estimate of the number of these 242 pegs 
that are acceptable. Suggest what the company could do differently so that an accurate estimate of the 
proportion of acceptable pegs could be obtained. 
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The following table shows data on the masses, m grams, of 150 objects. 


s m<0 m<12 m<30 m<53 m<70 m<80 The six plotted points, 
Cv whose coordinates you 
0 24 60 106 138 150 know, are joined by 


By drawing a cumulative frequency polygon, the following estimates will be straight lines. 


obtained: 
a Number of objects with masses less than 20 g = 40 objects. 
b Arranged in ascending order, the mass of the 100th object = 50 g. 


However, we can calculate these estimates from the information given in the table 
without drawing the polygon. 


Investigate the possible methods that we can use to calculate these two estimates 


1.5 Comparing different data representations 

Pictograms, vertica! line graphs, bar charts and pie charts are usefu! ways of displaying 
qualitative data and ungrouped quantitative data, and people generally find them easy 

to understand. Nevertheless, it may be of benefit to group a set of raw data so that we 

can see how the values are distributed. Knowing the proporiion of small, medium and 
large values, for example, may prove to be useful. For smali datasets we can do this by 
constructing stem-and-leaf diagrams, which have the advantage that raw values can still be 
seen after grouping. 


In large datasets individual values lose their significance and a picture of the whole 

is more informative. We can use frequency tables to make a compact summary by 

grouping but most people find the information easier to grasp when it is shown in a 

graphical format, which allows absolute, relative or cumulative frequencies to be seen. SC) FAST FORWARD 
Although some data are lost by grouping, histograms and cumulative frequency graphs 

have the advantage that data can be grouped into classes of any and varied widths. In Chapter 2 and 


: : Bye ; h itv of Chapter 3, we will see 
The choice of which representation to use will depend on the type and quantity of data, how grouping effects 


the audience and the objectives behind making the representation. Most importantly, the thé methods we tse 
representation must show the data clearly and should not be misleading in any way. to find measures of 


central tendency and 
measures of variation. 


The following chart is a guide to some of the most commonly used methods of data 
representation. 


Quaiita'tve data 


a Canoe ~ Pictogram, vertical line graph, bar chart, 
PISUR pie chart, sectional bar chart 
Discrete 


quantitative e2 
data Smell amount Stem-and-leaf diagram 


Grouped 


Continuous data -— 


Large amount 


Histogram 


Cumulative frequency graph | 
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EXERCISE 1D 


1 


Chapter 1: Representation of data 


Jamila noted each student’s answer when her year group was asked to name their favourite colour. 
a List the methods of representation that would be suitable for displaying Jamila’s data. 


b Jamila wishes to emphasise that the favourite colour of exactly three-quarters of the students is blue. 
Which type of representation from your list do you think would be the most effective for Jamila to use? 
Explain why you have chosen this particular type of representation. 


A large number of chickens’ eggs are individually weighed. The masses are grouped into nine classes, each of 
width 2 grams, from 48 to 66g. 

Name a type of representation in which the fact could be seen that the majority of the eggs have masses from 
54 to 60g. Explain how the representation would show this. 

Boxes of floor tiles are to be offered for sale at a special price of $75. The boxes claim to contain at least 100 
tiles each. 


a Why would it be preferable to use a stem-and-leaf diagram rather than a bar chart to represent the 
numbers of tiles, which are 112, 116, 107, 108, 121, 104, 111, 106, 105 and 110? 


b How may the seller benefit if the numbers 12, 16, 7, 8, 21, 4, 11, 6, 5 and 10 are used to draw the stem-and- 


leaf diagram instead of the actual numbers of tiles? 


A charity group’s target is to raise a certain amount of money in a year. At the end of the first month the 
group raised 36% of the target amount, and at the end of each subsequent month they manage to raise exactly 
half of the amount outstanding. 


a How many months will it take the group to raise 99% of the money? 
b Name a type of representation that will show that the group fails to reach its target by the end of the year. 


Explain how this fact would be shown in the representation. 


University students measured the heights of the 54 trees in the grounds of a primary school. As part of a 
talk on conservation at a school assembly, the students have decided to present their data using one of the 
following diagrams. 


Heights of 54 trees 


= 2to3m 
LC] 3 to 5m 
By stosm 


short 


medium 


Frequency density 


0'2 3 4 5 €sQh 8 
Heights of trees (m) 


a Give one disadvantage of using each of the representations shown. 


b Name and describe a different type of representation that would be appropriate for the audience, and that 
has none of the disadvantages given in part a. 
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6 The percentage scores of 40 candidates who took a Health and Safety test are given: 
77 44 65 84 52 60 35 83 68 66 50 68 65 57 60 50 93 38 46 55 
45 69 61 64 40 66 91 59 61 74 70 75 42 65 85 63 73 84 68 30 


a Construct a frequency table by grouping the data into seven classes with equal-width intervals, where the 
first class is 30-39. Label this as Table 1. 


b Itis proposed that each of the 40 candidates is awarded one of three grades, A, B or C. Construct a new 
frequency table that matches with this proposal. Label this as Table 2. 


c A student plans to display all three versions of the data (i.e. the raw data, the data in Table 1 and the data 
in Table 2) in separate stem-and-leat diagrams. 


For which version(s) of the data would this not be appropriate? Suggest an alternative type of 


representation in each case. 


(M) 7 Last year Tom renovated an old building during which he worked for at least 9 hours each week. By plotting 
four points in a graph, he has represented the time he spent working. 


cf (No. weeks) 


Time spent working (hours) 


a What can you say about the time that Tom spent working on the basis of this graph? 
b Explain why Tom’s graph might be considered to be misleading. 


c Name the different types of representation that are suitable for displaying the amount of time that Tom 
worked each week throughout the year. 


Consider the benefits of each type of representation and then fully describe (but do not draw) the one you 


believe to be the most suitable. 


(Ps) 8 The following table shows the focal lengths, /mm, of the 84 zoom lenses sold by a shop. For example, there are 
18 zoom lenses that can be set to any focal length between 24 and 50mm. 


24-50 | 50-108 100-200 150-300 250—400 
18 | 30 18 12 6 


a What feature of the data does not allow them to be displayed in a histogram? 


\ 


b What type of diagram could you use to illustrate the data? Explain clearly how you would do this. 
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(M) 9 The following table shows estimates, in hundred thousands, of the number of people living in poverty and the 
populations, in millions, of the countries where they live. (Source: World Bank 2011/12) 


Represent, in a single diagram, the actual numbers and the relative poverty that exists in these countries. 


In what way do the two sets of data in your representation give very different pictures of the poverty levels 
that exist? Which is the better representation to use and why? 


EXPLORE 1.4 


Past, present and predicted world population figures by age group, sex and other 
categories can be found on government census websites. 


You may be interested, for example, in the population changes for your own age 
group during your lifetime. This is something that can be represented in a diagram, 
either manually, using spreadsheet software or an application such as GeoGebra, and 
for which you may like to try making predictions by looking for trends shown in the 
raw data or in any diagrams you create. 


Checklist of learning and understanding 


Non-numerical data are called qualitative or categorical data. 

Numerical data are called quantitative data, and are either discrete or continuous. 

e Discrete data can take only certain values. 

e Continuous data can take any value, possibly within a limited range. 

Data in a stem-and-leaf diagram are ordered in rows with intervals of equal width. 

In a histogram, column area œ frequency, and the vertical axis is labelled frequency density. 


. class frequenc ; 3 
e Frequency density = lsu be ace and Class frequency = class width x frequency density. 
class width 


In a cumulative frequency graph, points are plotted at class upper boundaries. 
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END-OF-CHAPTER REVIEW fékcise1 | 


© 


1 The weights of 220 sausages are summarised in the following table. 


Qy <20 <30 <40 <45 <50 <60 <70 
0 20 50 | 100 160 210 220 
i State how many sausages weighed between 50g and 60g. [1] 
ii On graph paper, draw a histogram to represent the weights of the sausages. [4] 


Cambridge International AS & A Level Mathematics 9709 Paper 62 Q4 November 2011 [ Adapted] 


The lengths of children’s feet, in centimetres, are classified as 14-16, 17—19, 20—22 and so on. State the lower 
class boundary, the class width and the class mid-value for the lengths given as 17-19. [2] 


The capacities of ten engines, in litres, are given rounded to 2 decimal places, as follows: 
1.86, 2.07, 1.74, 1.08, 1.99, 1.96, 1.83, 1.45, 1.28 and 2.19. 


aN 


These capacities are to be grouped in three classes as 1.0-1.4, 1.5-2.0 and 2.1-2.2. 


a Find the frequency of the class 1.5—2.0 litres. [1] 
b Write down two words that describe the type of data given about the engines. [2] 
Over a 19-day period, Alina recorded the number of text messages she received 0/9 Key:1|5 
1 : z 1/0344 represents 15 
each day. The following stem-and-leaf diagram shows her results. 1/55568 messages 
a On how many days did she receive more than 10 but not more than 15 
messages? [1] 


b How many more rows would need to be added to the stem-and-leaf diagram if Alina included 
data for two extra days on which she received 4 and 36 messages? Explain your answer. [2] 


Key: 2 [1 


The following stem-and-leaf diagram shows eight randomly selected numbers 2 3a 
3 67 represents 2.1 


between 2 and 4. 


Given that a — b = 7 and that the sum of the eight numbers correct to the nearest integer is 24, 
find the value of a and of b. [3] 


Eighty people downloaded a particular application and recorded the time taken for the download to complete. 
The times are given in the following table. 


lya <3 <5 <6 <10 
© 6 18 66 80 
a Find the number of downloads that comnleted in 5 to 6 minutes. [1] 


b Ona histogram, the download times from 5 to 6 minutes are represented by a column of height 9.6cm. 
Find the height of the column that represents the download times of 6 to 10 minutes. [2] 


A histogram is drawn with three columns whose widths are in the ratio 1:2 : 4. The frequency densities of these 
classes are in the ratio 16:12 : 3, respectively. 


a Given that the total frequency of the data is 390, find the frequency of each class. [3] 


b The classes with the two highest frequencies are to be merged and a new histogram drawn. Given that 
the height of the column representing the merged classes is to be 30cm, find the correct height for the 
remaining column. [3] 


c Explain what problems you would encounter if asked to construct a histogram in which the classes with 
the two !owest frequencies are to be merged. [1] 
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Q 8 The histograms below illustrate the number of hours of sunshine during August in two regions, A and B. 
Neither region had more than 8 hours of sunshine per day. 


Region A Region B 
o 6 » 6 
g g 
5 = 
n n 
g g 
5 3 
n n 
S4 S4 
H= H 
=] 3 
a © 
5 = 
= H 
o v 
a a 
2 Z? 
S fs] 
A A 


0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 
Hours of sunshine Hours of sunshine 
a Explain how you know that some information for one of the regions has been omitted. [2] 


b After studying the histograms, two students make the following statements. 


e Bindu: There was more sunshine in region A than in region B during the first 2 weeks of August. 
e Janet: In August there was less sunshine in region A than in region B. 
Discuss these statements and decide whether or not you agree with each of them. 


In each case, explain your reasoning. [3] 


(©) 9 A hotel has 90 rooms. The table summarises information about the number of rooms occupied each day for a 
period of 200 days. 


1-20 21-40 41-50 51-60 61-70 71-90 


10 32 62 50 28 18 
i Draw a cumulative frequency graph on graph paper to illustrate this information. [4] 
ii Estimate the number of days when more than 30 rooms were occupied. [2] 
iii On 75% of the days at most n rooms were occupied. Estimate the value of n. [2] 
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D Measures of ¢éntral tendency 


In this chapter you will learn bow to: 
m find and use different measures of central tendency 
nd use ihe mean of a set of data (including grouped data) either from the data itself or 
m a given tota! Xx or a coded total X(x — b) and use such totals in solving problems that may 


Chapter 2: Measures of central tendency 


PREREQUISITE KNOWLEOGE 


Where it comes from What you should be able to do Check your skills 


IGCSE / O Level Mathematics | Calculate the mean, median and mode |1 Find the mean, median 

for individual and discrete data. and mode of the numbers 

7.3, 3.9, 1.3, 6.6, 9.2, 4.7, 3.9 

and 3.1. 

Use a calculator efficiently and apply Use a calculator to evaluate 

appropriate checks of accuracy. 6x1.74+8x1.94+11x2.1 znd 
6+8+11 

then check that your answer 

is reasonable. 


A 


Three types of average 
There are three measures of central tendency that are commonly used to describe the 
average value of a set of data. These are the mode, the mean and the median. 


e The mode is the most commonly occurring value. 
e The mean is calculated by dividing the sum of the values by the number of values. 
e The median is the value in the middle of an ordered set of data. 


We use an average to summarise the values in a set of data. As a representative value, it 
should be fairly central to, and typical of, the values that it represents. 


If we investigate the annual incomes of all the people in a region, then a single value (i.e. 

an average income) would be a convenient number to represent our findings. However, 
choosing which average to use is something that needs to be thought about, as one measure 
may be more appropriate to use than the others. 


Deciding which measure to use depends on many factors. Although the mean is the 
most familiar average, a shoemaker would prefer to know which shoe size 1s the most 
popular (i.e. the mode). A farmer may find the median number of eggs laid by their 
chickens to be the most useful because they could use it to identify which chickens are 
profitable and which are not. As for the average income in our chosen region, we must 
also consider whether to calculate an average for the workers and inanagers together or 
separately; and, if separately, then we need to decide who fits into which category. 


A 
| Various sources tell us that the average person: 


laughs 10 times a day 

falls asleep in 7 minutes 

sheds 0.7kg of skin each year 

grows 944km of hair in a lifetime 
produces a sneeze that travels at 160 km/h 
has over 97000 km of blood vessels in 
their body 

e has a vocabulary between 5000 and 6000 
words. 
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The average adult male is 172.5cm tall and weighs 80kg. 

The average adult female is 159cm tall and weighs 68 kg. 

What might eack of these statements mean and how might they have been determined? 
Have you ever met such an average person? How could this information be useful? 


You can find a variety of continuously updated figures that yield interesting averages 
at http://www.worldometers.info/. 


2.1 The mode and the modal class 
As you will recall, a set of data may have more than one mode or no mode at all. 


The following table shows the scores on 25 rolls of a die, where 2 is the mode because it has 
the highest frequency. 


1 q 2 f3 f4 | 5 | 6 
5 | 6 | 5 3 2 | 4 


In a set of grouped data in which raw values cannot be seen, we can find the modal class, 
which is the class with the highest frequency density. 


EXAMPLE 2.1 


Find the modal class of the 270 pencil 
lengths, given to the nearest centimetre 
in the following table. 


Answer 


Class boundaries, 
class widths and 
frequency density 


25S NS 7D 100 4 100 +4=25 1 
calculations are 
<x +3= i 
75=x<10.5 a 90 3 90+3=30 dorn liie Tiie, 
10.5=£x<182.5 80 2 80+2=40 
i 1 Although the 


histogram shown 


is not needed 
2 z to answer this 
5? question, it is useful 
i a > , 
Fo to see that, in this 
3, 2 case, the modal 
ES class is the one with 
the tallest column 


and the greatest 
frequency density, 
even though it 

has the iowest 
frequency. 


| 
3.5 7.5 10.5 12.5 
Length (cm) 


The modal class is 11—{2cm (or, more accurately, 
10.5. *<12.5cm), 
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Q 


We saw how to 
calculate frequency 
density in Chapter 1, 
Section 1.3. 


In histograms, the 
modal class has the 
greatest column 
height. If there is no 


modal class then all 
classes have the same 
frequency density. 


The modal class does 
not contain the most 


pencils but it does 
contain the greatest 
number of pencils per 
centimetre. 
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WORKED EXAMPLE 2. 


—_ sss sss = 


Two classes of data have interval widths in the ratio 3:2. Given that there is no 
modal class and that the frequency of the first class is 48, find the frequency of the 
second class. 


Answer 


Let the frequency of the second class be x. 


48:x = 3:2 
*o* 
48 3 : 
x=32 ; 7 In the special case 
; where all classes 
The second class has a frequency ot 32. have equal widths, 
frequency densities 
Or, let the frequency of the second class be x. are proportional to 
frequencies, so the 
x 48 : À 
373 ceeenreececeeeseeeeeoeece NS modal class is the 
x z class with the highest 
32 


frequency. 


EXERCISE 2A 


1 Find the mode(s) of the following sets of numbers. 


a 12,15,11,7, 4,10, 32, 14, 6, 13,19, 3 b 19,21, 23, 16, 35, 8, 21, 16, 13,17, 12,19, 14,9 


2 Which of the eleven words in this sentence is the mode? 
3 Identify the mode of x and of y in the following tables. 
4 5 6 7 8 4 
1 J 5 6 4 27 
4 Find the modal class for x and for y in the following tables. 
Ge | o 4- 14-20 3-6 7-11 12-20 
/ 5 9 8 66 80 134 


5 A small company sells glass, which it cuts to size to fit into window frames. How could the company benefit 
from knowing the modal size of glass its customers purchase? 


6 Four classes of continuous data are recorded as 1-7, 8-16, 17—20 and 21-25. The class 1-7 has a frequency of 
84 and there is no modal class. Find the total frequency of the other three classes. 


(Ps) 7 Data about the times, in seconds, taken to run 100 metres by n adults are given in the following table. 


13.6<x<15.4 15.4<x<17.4 17.45x<i9.8 
a b 27 


By first investigating the possible values of a and of b, find the largest possible value of n, given that the modal 
class contains the slowest runners. 
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(Ps] 8 Three classes of continuous data are given as 0—4, 4-10 and 10-18. The frequency densities of the classes 
0—4 and 10-18 are in the ratio 4:3 and the total frequency of these two classes is 120. Find the least possible 
frequency of the modal class, given that the modal class is 4—10. 


2.2 The mean 


The mean is referred to more precisely as the arithmetic mean and it is the most commonly 
known average. The sum of a set of data values can be found from the mean. Suppose, for 
example, that 12 values have a mean of 7.5: 
sum of values sum of values 
Mean = , so 7.5 =———_——- 
number of values 12 
You will soon be performing calculations invoiving the mean, so here we introduce 


notation that is used in place of the word definition used above. 


and sum of values = 7.5 x 12 = 90. 


We use the upper case Greek letter ‘sigma’, written È, to represent ‘sum’ and Y to represent 
the mean, where x represents our data values. 


The notation used for ungrouped and for grouped data are shown on separate rows 
in the following table. 


Ungrouped £ x = n Èx x= od 
| n 
| x. 
Grouped 2 x f xf Uxf or Efx = Zxf n=Xf for grouped 
zf data. 


WORKED EXAMPLE 2.3 
Xxf = Xfx indicates the 
sum of the products 

of each value and 

its frequency. For 
example, the sum of 


Five labourers, whose mean mass is 70.2 kg, wish to go to the top of a building in 
a lift with some cement. Find the greatest mass of cement they can take if the lift 
has a maximum weight allowance of 500kg. 


Answer five 10s and six 20s 
Tea FRA a is (10x5)+(20x6) = 
70.25 We first rearrange the forinula x = a (5x10)+(6x20)=170. 
= WEA A n 
to find the sum of the labourers’ masses. 
= 351k¢ 
351+ y= 500 We now forin an equation using y to For ungrouped data, 
y=149 represent the greatest possible mass of Mee 
the mean is x =—. 


cement. n 
For grouped data, 
zo 2g, BE 


xf re 


The greatest mass of cement 
they can take is 149 kg. 
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WORKED EXAMPLE 2.4 


= 


Find the mean of the 40 values of x, given in the following table. 


31 32 33 34 35 


5 7 9 8 11 


5 7 9 8 ïi If =40 


155 224 297 272 WI | Zf =1333 


Tieman Fe i z — 493325. 


Combined sets of data 


There are many different ways to combine sets of data. However, here we do this by simply 
considering al! of their values together. To find the mean of two combined sets, we divide 
the sum of ali their values by the total number of values in the two sets. 


For example, by combining the dataset 1, 2, 3, 4 with the dataset 4, 5, 6, we obtain a new set 
of data that has seven values in it: 1, 2, 3, 4, 4,5, 6. Note that the value 4 appears twice. 


Tudividually, the sets have means of ws =2.5 and = - ze = 5. The combined sets Note thatthe mean of 
the combined sets is 
have a mean of Lt2+3+44+4+5+6 _ og’ 2.54+5 
443 7 not : 


2 


WORKED EXAMPLE 2.5 


ee IM 


A large bag of sweets claims to contain 72 sweets, having a total mass of 852.4 g. 
A small bag of sweets claims to contain 24 sweets, having a tota! mass of 282.8 g. 


What is the mean mass of all the sweets together? 


Answer Our answer assumes 
Total number of sweets = 72 + 24 = 96. that the masses given 
EEA EEEE are accurate to 
Total mass of 1 decimal place; that 
sweets = 852.4 + 282.8 =1135.2g the numbers of sweets 
given are accurate; and 
11352 that the masses of the 


M = —— = 11.825 
ean mass 06 g 


bags are not included 
in the given totals. 


Copyright Material - Review Only - Not for Redistribution 


X 


N 
Cambridge International AS & A Levelgfathematics: Probability & Statistics 1 


WORKED EXAMPLE 2.6 


A family has 38 films on DVD with a mean playing time of 1hour 32 minutes. They also have 26 films on video 
cassette, with a mean playing time of 2 hours 4 minutes. Find the mean playing time of all the films in their collection. 


Answer 


38 + 26 = 64 films 


(1h 32 min x 38) + (24 min x 26) = (92 x 38) + (124 x 26) ___ We find the total number of films 
~ 6720 min and their total playing time. 
| Mean playing time = aw f 
64 The 64 films have a total playing 
=105min or 1h 45min. time of 6724 minutes. 


In Worked example 2.6, the mean playing time of 105 minutes is not equal to 
92 +124 
= 
fA fB G 
The mean of A and B + ae ee but this is not always the case. 


2 
; XA ÈB 
Suppose two sets of data, A and B, have m and n values with means — and —, 
n 


The symbol # means ‘is 


respectively. not equal to’. 


In what situations will the mean of A and B together be equal to 
mean of 4+ mean of B, 


2 


Means from grouped frequency tables 

When data are presented in a grouped frequency table or illustrated in a histogram or 
cumulative frequency graph, we lose information about the raw values. For this reason we 
cannot determine the mean exactly but we can calculate an estimate ot the mean. We do 
this by using mid-vatues to represent the values in each class. 


xf 


We use the formula ¥ = SA given in Key point 2.2, to calcuiate an estimate of the mean, 


where x now represents the class mid-values. 


zw, 
\ WORKED EXAMPLE 2.7 


Coconuts are packed into 75 crates, with 40 of a similar size in each crate. 
46 crates contain coconuts with a total mass from 20 up to but not including 25kg 
22 crates contain coconuts with a total mass from 25 up to but not including 40kg. 


7 crates contain coconuts with a total mass from 40 up to but not including 54kg. 
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a Calculate an estimate of the mean mass of a crate of coconuts. 


b Use your answer to part a to estimate the mean mass of a coconut. 


Answer 
a 
20- 25= 40-54 
46 22 7 2/29 
22.5 32.5 47.0 g 
1035 715 329 Ex = 2079 
f ] 
Estimate for mean, Pe Mais =27.72kg a Ch, 
E B 
aoe 


eeeceeutcoeooeeoaee 


b Estimate for the mean mass of a coconut = = 


= 0.693kg 


When gaps appear between classes of grouped data, class boundaries should be used to 
find class mid-values. The following example shows a situation in which using incorrect 
boundaries leads to an incorrect estimate of the mean. 


WORKED EXAMPLE 2.8 


Calculate an estimate of the mean age of a group of 50 students, where there are sixteen 
18-year-olds, twenty 19-year-olds and fourteen who are either 20 or 21 years old. 


Answer 


a~ 
18s 4<19 18.5 16 296 
19<A<20 19.5 20 390 Incorrect mid-values 
20<A<22 21.0 14 294 of 18,19 and 20.5 give 
an incorrect estimated 
xf =50 Exf = 980 mean of 19.1 years. 
We will see how 
Estimate of the mean age is Z$ = 280 the mean is used to 
xf 50 calculate the variance 
= 19.6years and standard deviation 


of a set of data in 
Chapter 3, Section 3.3. 


Copyright Material - Review Only - Not for Redistribution 


X 


N 
Cambridge International AS & A Levelgfathematics: Probability & Statistics 1 


EXERCISE 2B 


1 Calculate the mean of the following sets of numbers: 
a 28,16,83, 72,105, 55, 6 and 35 
b  7.3,8.6,11.7,9.1,1.7 and 4.2 


c 34,54,94,-44 and 73. 


a The mean of 15, 31, 47, 83, 97,119 and p° is 63. Find the possible values of p. 
b The mean of 6, 29, 3, 14, q, (q +8), ¢ and (10 — q) is 20. Find the possible values of q. 


3 Given that: 
a n=14 and Èx = 325.5, find X. 
b n=45 and y=23.6, find the value of Èy. 
c Yz=4598 and Z = 52.25, find the number of values in the set of data. 
d x Xxf =86 and ¥=74, find the value of Zf. 
e Yf=135 and x = 0.842, find the value of Xxf. 


4 Find the mean of x and of y given in the following tables. 


a 18.0 | 18.5 | 19.0 | 19.5 | 20.0 b 3.62 | 3.65 | 3.68 | 3.71 | 3.74 
8 10 17 24 1 127 | 209 | 322 | 291 | 251 


5 For the data given in the following table, it is given that g = 83. 


7 8 9 16 
9 13 a 11 


Calculate the value of a. 


6 Calculate an estimate of the mean of x and of y given in the following tables. 


a 0<x<2 2<x<4 4<x<8 8<x<14 
J Ka 
A 8 9 11 2 


b 


13<y<16 l6<y<2l 2i<y<28 28 = y <33 335 y<36 
y y ZIVS | SSY | YSYyS | 
7 17 | 29 16 11 


7 An examination was taken by 50 students. The 22 boys scored a mean of 71% and the girls scored a mean of 
76%. Find the mean score of all the students. 


8 A company employs 12 drivers. Their mean monthly salary is $1950. A new driver is employed and the mean 
monthly salary falls by $8. Find the monthly salary of the new driver. 


Q (M) 9 The mean age of the 16 members of a karate club is 26 years and 3 months. One member leaves the club and 


the mean age of those remaining is 26 years. Find the age of the member who left the club. Give a reason why 
your answer mighi not be very accurate. 
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(M) 10 The following table shows the hourly rates of pay, in dollars, of a company’s employees. 
V 

pO i $ 

ANO) 8 


a Is the mean a good average to use here? Give a reason for your answer. 


7 8 109 
11 17 1 


b Find the mean rate of pay for the majority of the employees. 


PS) Q 11 A train makes a non-stop journey from one city to another and back again each day. Over a period of 
30 days, the mean number of passengers per journey is exactly 61.5. Exact one-way ticket prices paid by these 
passengers are given by percentage in the following table. 


34 
30 


a Calculate the tota! revenue from ticket sales, and explain why your answer is an approximation. 


38 45 
41 29 


b The minimum and maximum possible revenues differ by $k. Find the value of k. 


i> be 2 TEECEI EEI EHE BESTE Eee e es eee eee 
12 The heights, in centimetres, of 54 children are ‘ | EEEH H | | i ii ii | in ii in 
represented in the following diagram. B HAHHA HH HEHE 
254 man jh 
The children are split into two equal-sized = H Sassaan HHHH 
S EARE GUCE heey PRR Shee eee Pee eee snes oe BENNE TERRE Ra 
groups: a ‘tall half’ and a ‘short half’. 3 2 FHE HHHH 
ites Sin 
Calculate an estimate of the difference between HH HEHH 
. p. P 
the mean heights of these two groups of children. 0 a oF ise 458 
Height (cm) 


(M) 13 The following table summarises the number of tomatoes produced by the plants in the plots on a farm. 


20-29 30—49 50-79 80-100 
329 413 704 258 


a Calculate an estimate of the mean number of tomatoes produced by these plots. 


b The tomatoes are weighed accurately and their mean mass ts found to be 156.50 grams. At market they 
are sold tor $3.20 per kilogram and the total revenue is $50 350. Find the actual mean number of tomatoes 
produced per plot. 


c Why could your answer to part b be inaccurate? 
14 Twenty boys and girls were each asked how many aunts and uncles 


they have. The entry 4/5 in the following tabie, for example, shows 
that 4 boys and 5 girls each have 3 aunts and 2 uncles. 


a Find the mean number of uncles that the boys have. 


b For the boys and girls together, calculate the mean 
number of: 


i aunts ii aunts and uncles. 


c Suggest an alternative way of presenting the data so that the calculations in parts a and b would be 
simpler to make. 
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Ps) 15 A calculated estimate of the mean capacity of 120 refrigerators stored at a warehouse is 348 litres. The 
capacities are given in the following table. 


A delivery of n new refrigerators, all with capacities between 200 and 320 litres arrives at the warehouse. 
This causes the mean capacity to decrease by 8 litres. Find the value of n and state what assumptions you 
are making in your calculations. 


(Ps) 16 A carpet fitter is employed to fit carpet in each of the 72 guest bedrooms at a new hotel. The following table 
shows how many rooms were comp!eted during the first 10 days of work. 


5 6 or 7 


2 8 


Based on these figures, estimate how many more days it will take to finish the job. What assumptions are you 
making in your calculations? 


; . or x 8 Y 
Ps) 17 In the figure opposite, a square of side 8cm is joined edge-to-edge to a ; 
semicircle, with centre O. P is 2 cm from O on the figure’s axis of symmetry. 


Poinis X and Y are fixed but the position of Z is variable on the shape’s 


perimeter. 
a Find the mean distance from P to X,Y and Z when angle POZ is equal to: 
i 180° ii 135°. 


b Find obtuse angle POZ, so that the mean distance from P to X,Y and Z 
is identical to the mean distance from P to X and Y. 


Six cards, numbered 1, 2,3, 4,5 and 6, are placed in a 
bag, as shown. 


15 different pairs of cards can be selected without 
replacement from the bag. Three of these pairs are 
{2,3}, {6,4} and {5, 1}. 


Make a list of all 15 unordered pairs and find the mean 
of cach. We will denote these mean values by Y>. 


1 Choose a suitable method to represent the values of X, and 
their frequencies. Find X>, the mean of the values of X3. 


Repeat the process described above for each of the following: 


the six possible selections of five cards, denoting their means by X5 


the 15 possible selections of four cards, denoting their means by XY, We will study the 


number of ways of 
selecting objects in 
Chapter 5, Section 5.3. 


the 20 possible selections of three cards, denoting their means by X3 


the six possible selections of one card, denoting their means by X4. 
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What does the single value of Xe represent? 
Do the values of ¥;, X2, X3, X4 and X; have anything in common? 
Can you suggest reasons for any of the common features thai you observe? 


Investigate the values of X, when there are a different number of consecutively 
numbered cards in the bag. 


Coded data 


To code a set of data, we can transform all of its values by addition of a positive or negative 
constant. The result of doing this produces a set of coded data. 


One reason for coding is to make the numbers easier to handle when performing manual 
calculations. Also, it is sometimes easier to work with coded data than with the original 
data (by arranging the mean to be a convenient number, such as zero, for example). 


To find the mean of 101, 103, 104, 109 and 113, for example, we can use the values 1, 3, 4, 9 
and 13. 


Our x values are 101, 103, 104, 109 and 113, so 1, 3, 4, 9 and 13 are corresponding values of 
(x—100). 


Mean of the coded values is mes — — 1434449413 _ 6. 


5 
We subtracted 100 from each x value, so we simply add 100 to the mean of the coded 
values to find the mean of x. 


X(x — 100) 


Mean(x) = mean(x —100)+100=106 or x= +100 = 106 


Refer to the following diagram. If we add —b to the set of x values, they are all transiated 
by —b and so is their mean. 


So, mean(x—b)=x—b. 


translated by —b 


Vetues of x —b 


For ungrouped data, x = oe ae 


For grouped data, x = Se b. 


These formulae can be summarised by writing ¥ = mean (x-—b)+b. 
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FAST FORWARD 


We will study the 
standard normal 
variable, which has a 
mean of 0, in Chapter 8, 
Section 8.2. 


We will see how to use 
coded totals, such as 
X(x—b) and X(x—-by, 
to find measures of 
variation in Chapter 3, 
Section 3.3. 


If we remove the 
bracket from (x —)), 
we obtain Èx -— Eb. The 


term Xb means ‘the 
sum of all the bs’ and 
there are n of them, so 
X(x-—b)= 2x -nb. 


ww 
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For two datasets coded as (x — vw) and (y — b), we can use the totals £x and Èy to find the 
mean of the combined set of values of x and y. 


YS 
WORKED EXAMPLEQS 


The exact age of an individual boy is denoted by b, and the exact age of an individual girl is denoted by g. 


Exactly 5 years ago, the sum of the ages of 10 boys was 127.0 years, so X(b — 5) = 127.0. 


In exactly 5 years’ time, the sum of the ages of 15 girls will be 351.0 years, so X(g + 5) = 351.0. 


Find the mean age today of 


a the 10 boys b the 15 girls c the 10 boys and 15 girls combined. 
Answers 

=. 127 ; 5 : a 

a b= T0 +5=17.7 years _ We update the boys’ past mean age by addition. 
2(b—5) = Xb- (10 x 5) = 127, so Alternatively, we expand the brackets. 
Lb =127+50=177 and b = 2 =17.7 years 

b Z= 351 5= 18.4 years -o We backdate the girls’ future mean age by subtraction. 

15 
X(g+5)= Zg+ (15x 5)=351, so _ Alternatively, we expand the brackets. 
276 


deg = 351-75 = 276 and == 18.4 years 


Forty values of x are coded in the following table. 


0- 18— 24-32 


9 13 18 


Calculate an estimate of the mean value of x. 


Answer 


e a +3 We calculate an estimate for the 
mean of the coded data using 
= (9X9) + 21x13) + (28x18) | 3 class mid-values of 9, 21 and 28, 
40 and then add 3 to obtain our It is not necessary to 
= 24.45 decode the values of 


estimate for X. 


x3. 
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For 10 valves denoted x, it is given that x = 7.4. Find: 


a Èx b X(x+2) c 2(x-1) 
Twenty-five values of z are such that £(z—7) = 275. Find Z. 
Given q =22 and X(q—-4) = 3672, find the number of values of q. 


The lengths of 2500 bolts, x mm, are summarised by &(x — 40) = 875. Find the 
mean length of the bolts. 


Six data values are coded by subtracting 13 from each. Five of the coded values 
are 9.3, 5.4, 3.9, 7.6 and 2.2, and the mean of the six data values is 17.6. 


Find the sixth coded value. 


The SD card slots on digital cameras are designed to accommodate a card of 
up to 24mm in width. Due to low sales figures, a manufacturer suspects that 
the machwwe used to cut the cards needs to be recalibrated. The widths, w mm, 
of 400 of these cards were measured and are coded in the following table, where 
x=w-24. 


ie 0.15<x<-0.1 
32 


a Suggest a reason why the widths have been coded in this way. 


O0.1<x<9 O0.1<x<0.2 


2 


0<x<0.1 


6 


360 


b What percentage of the SD cards are too wide to fit into the slots? 
c Use the coded data to estimate the mean width of these 400 cards. 


Sixteen bank accounts have been accidentally under-credited by the following 
amounts, denoted by $x. 


917.95 917.98 918.03 917.97 918.01 917.94 918.05 918.07 
918.02 917.93 918.01 917.88 918.10 917.85 918.11 917.94 


To calcniate x manually, Fidel and Ramon code these figures using (x — 917) 
aud (x — 920), respectively. 


Who has the simpler maths to do? Explain your answer. 


Throughout her career, an athlete has been timed in 120 of her 400-metre races. 
Her times, denoted by ¢ seconds, were recorded on indoor tracks 45 times and are 
summarised by X(t— 60) = 83.7, and on outdoor tracks where X(t — 65) = —38.7. 
Calculate her average 400-metre running time and comment on the accuracy You will study the 
of your answer. mean of linear 
combinations of 


All the interior angles of n triangular metal plates, denoted by y°, are measured. random variables 
in the Probability 
& Statistics 2 


b Hence, or otherwise, find the value of X(y— 30). Coursebook, 
Chapter 3. 


a State the number of angles measured and write down the value of y. 


A datasct of 20 values is denoted by x where X(x—1)=5é. Another dataset of 30 
values is denoted by y where X( y — 2) = 36. Find the mean of the 50 values of x and y. 
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11 Students investigated the prices in dollars ($) of 1 litre bottles of a certain drink 
at 24 shops in a town and at 16 shops in surrounding villages. Denoting the town 
prices by ¢ and the village prices by v, the students’ data are summarised by the 
totals X(t- 1.1) = 1.44 and X(v - 1.2) = 0.56. 


Find the mean price of 1 litre of this drink at all the shops at which the students 
collected their data. 


A set of data can be coded by multiplication as well as by addition of a constant. 


; F 
Suppose the monthly take-home salaries of four teachers are $3600, $4200, $3700 and eo aes 


$4500, which have mean ¥ = $4000. es Hza] 
| a n 
| What happens to the mean if all the teachers receive a 10% increase but must pay an extra 
$50 in tax each month? For grouped data, 
. f . : -1|2ax-b)f . 
To find their new take-home salaries, we multiply the current salaries by 1.1 and then subtract 50. “ol Sf HO 
The new take-home salaries are $3910, $4570, $4020 and $4900. These formulae can 
. 3910 + 4570 + 4020 + 4900 be summarised by 
The mean is 3910 +457 “4 0+ 490 -= $4350. a 
The original data, x, has been coded by multiplication and by addition as 1.1x-— 50. z L x [mean(ax- b)+ b]. 


The mean of the coded data is 4350, which is equal to (1.1x 4000)- 50, where 4000 = x. 


Data coded as ax—b has a mean of ax —b. 


To find x trom a total such as X(ax — b), we can find the mean of the coded data, then undo 
| ‘bp and undo ‘xa’, in that order. That is: 


4350+50)+1.1 or H x (4350 + 50) = 4000. Z(ax— b) can be 


rewritten as aÈ£x- nb. 


HS 


( 
$ 


WORKED EXAMPLE 2.11 


The total area of cloth produced at a textile factory is denoted by £x and is measured in square metres. Find 
an expression in x for the area of cloth produced in square centimetres. 


Answer 


Im=100cm 


im? = 1002 cm? =10000cm2 We convert the measurements 


of x from m? to cm”. 
Total area, in square centimetres, is £10000x or 100002x 


For the 20 values of x summarised by (2x — 3) = 104, find x. 


Answer 
104 
70 = 5.2 We first find the mean of the coded values. 
J= 5.2+3 _ 4.1 Knowing that 2x -3 = 5.2, we undo the ‘—3’ and 
2 then undo the ‘x2’, in that order, to find x. 


i) 
| 
ə 
(we RKED EXAMPLE 2.12 
1 
| 
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2(2x — 3) =104 Alternativel n h 
28x — (20x 3) =104 CEES Bere ese EES kets in X(2 h alle We will see how to 
Yx =82 to find the valve of Xa use coded totals such 
as X(ax—b) and 
= _ 82 : X(ax —b)* to find 
o S measures of variation in 


BXERCISE 2D 


1 


Chapter 3, Section 3.3. 


The masses, xkg, of 12 objects are such that x = 0.475. Find the value of £1000x and state what it represents. 
The total mass of gold extracted from a mine is denoted by Xx, which is measured in grams. Find an 
expression in x for the total mass in: 

a carats, given that i carat is equivalent to 200milligrams 

b kilograms. 


The area of land used for growing wheat in a region is denoted by Ew hectares. Find an expression in w for the 
total area in square kilometres, given that 1 hectare is equivalent to 10000 m?. 


Speeds, measured in metres per second, are denoted by x. Find the constant k such that kx denotes the speeds 
in kilometres per hour. 


The wind speeds, x miles per hour (mph), were measured at a coastal location at midday on 40 consecutive 
days and are presented in the following table. 


<x<17 17sx<20 205x<24 24=x<25 


9 13 14 4 


Abel wishes to calculate an estimate of the mean wind speed in kilometres per hour (km/h). He knows that a 
distance of 5 miles is approximately equal to 8km. 


a Explain how Abel can calculate his estimate without converting the given boundary values from miles per 
hour to kilometres per hour. 


b Use the wind speeds in mph to estimate the mean wind speed in km/h. 


Given that 15 values of x are such that &(3x — 2) = 528, find x and find the value of b such that 
=(0.5x — b) = 138. 


For 20 values of y, it is given that X(ax — 6) = 400 and &(bx — a) = 545. Given also that x = 6.25, find the value 
of a and of b. 


The midpoint of the line segment between A and B is at (5.2,—1.2). 


Find the coordinates of the midpoint after the following transformations have been applied to A and to B. 


a T: Translation by the vector . } 
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b E: Enlargement through the origin with scale factor 5. 


c Transformations T and E are carried out one after the other. Investigate whether the location of the mid- 
point of 4% is independent of the order in which the transformations are carried out. 


(M) 9 Five investors are repaid, each with their initial investment increased by p% plus a fixed ‘thank you’ bonus 
of $4. The woman who invested $20000 is repaid double her investment and the man who invested $7500 is 
repaid triple his investment. Find the total amount that the five people invested, given that the mean amount 


repaid to them was $33 000. 


Do you think the method of repayment is fair? Give a reason for your answer. 


PS) 10 One of the units used to measure pressure is pounds per square inch (psi). The mean pressure in the four tyres 
of a particular vehicle is denoted by x psi. Given that 1 pound is approximately equal to 0.4536kg and that 
1 metre is approximately equal to 39.37 inches, express the sum of the pressures in the four tyres of this vehicle 


in grams per cm’. 


2.3 The median 


You will recall that the median splits a set of data into two parts with an equal number of 
values in each part: a bottom half and a top half. In a set of n ordered values, the median is 
at the value half-way between the Ist and the nth. 


Consider a DIY store that opens for 12 hours on Monday and for 15 hours on Saturday. 
The numbers of customers served during each hour on Monday and on Saturday last week 
are shown in the following back-to-back stem-and-leaf diagram. 


Monday (12)| | Saturday (15) Key: 0 | 2 | 2 
863100/2/2346 represents 20 customers on 
43110/3/556899 Monday and 22 customers 

1/4/01379 on Saturday 


To find the median number of customers served on each of these days, we need to find their 
positions in the ordered rows of the back-to-back stem-and-leaf diagram. 


For Saturday, there are n= 15 values arranged in ascending order from top to bottom and 


from left to right. The median is at the (et = ae ‘) = 8th value. 


In the first row, we have the Ist to 4th values, and in the second row we have the 5th to 
10th values, so the 8ib value is 38. 


The median number of customers on Saturday was 38. 


For Monday, there are n =12 values arranged in ascending order from top to bottom and 


. oe n+1 12+1 
from right to left. The median is at the 5 th= ( G th =6.5th value, so we locate the 
median mid-way between the 6th and 7th values. 
Tn the first row, we have the Ist to 6th values and the 6th is 28. 


The first value in the second row is the 7th value, which is 30. 
28+ 30 


The median number of customers on Monday was = 29. 


When data appear in an ordered frequency table of individual values, we can use 
cumulative frequencies to investigate the positions of the values, knowing that the median 


is at the (4 “en value. 
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For n ordered values, 
the median is at the 


(=) th value. 
2 


For even values of n, 
the median is the mean 
of the two middle 
values. 


We can find the 8th 
value by counting down 
and left to right from 22 
or by counting up and 
right to left from 49. 


Take care when 
locating values at the 
left side of a back- 
to-back stem-and-leaf 
diagram; they ascend 
from right to left, and 
descend from left to 
right, as we move along 
each row. 


n is equal to the total 
frequency Èf. 
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WORKED EXAMPLE 2.13 


— 


The following tabie shows 65 ungrouped readings of x. Cumulative frequencies 
and the positions of the readings are also shown. Find the median value of x. 


O 


40 11 11 Ist to 11th 
41 23 34 12th to 34th 
| 42 19 53 35th to 53rd 
43 8 61 54th to ólst 
44 4 65 62nd to 65th 
Answer 
Median value of x is 41. seoooooo oi A Frequencies must be 


taken in account here. 
Although 41 is not the 
middle of the five values 
of x, it is the middle of 
the 65 readings. 


Estimating the median H44) REWIND 
In iarge datasets and in sets of continuous data, values are grouped and the 

actual values cannot be seen. This means that we cannot find the exact value of the 
median but we can estimate it. The method we use to estimate the median for this 
type of data is by reading its value from a cumulative frequency graph. We estimate 
the median to be the value whose cumulative frequency is equal to half of the total 
frequency. 


We studied cumulative 
frequency graphs in 
Chapter 1, Section 1.4. 


| On a cumulative 
frequency graph with 
total frequency n= Xf 
the median is at the 


Consider the masses of 300 museum artefacts, which are represented in the following 
cumulative frequency graph. 


JIT E al | 
Po Bg Sth value. 
300 HHHH : 
BEREH H H H 
5 250 Torr | 
f= BAARN ENH BE 
a> ima | 
2 2 CECE CC 
imal LTT . 
fe 20 Tr H The graph is only an 
2 E FEH HH estimate, so we use 4 
Sg 150 HHH H- H j 2 
imal I + 
Ec HA aa rather than = to 
5 100 HHHH l 5 
AH E = estimate the median. 
so HHHH This ensures that we 
Bag at Eg arrive at the same 
0 position for the median 
1 2 Y3 4 5 whether we count up 
median Mass (ke) from the bottom or 
down from the top 
The set of data has 2 total of n = 300 values. of the cumulative 


; beaux . frequency axis. 
An estimate for the median is the mass of the . = as = 150th artefact. i i 
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We draw a horizontal line from a cumulative frequency value of 150 to the graph. Then, at 
the point of intersection, we draw a perpendicular, vertical line down to the axis showing 
the masses. 


Reading from the graph, we see that the median mass is approximately 2.6kg. 


The concept of representing many different measurements 
with one representative value is quite a recent invention. 
There are no historical examples of the mean, median or 

; mode being used before the 17th century. 


In trying to find the longitude of Ghanza in modern-day 
Afghanistan, and in studying the characteristics of metals, the 
11th century Persian Al-Biruni is one of the earliest known 
users of a method for finding a representative measure. He 
used the number in the middle of the smallest and largest 
values (what we would call the mid-range) ignoring all but the 
minimum and maximum values. 


The mid-range was used by Isaac Newton and also by 
explorers in the 17th and 18th centuries to estimate their geographic positions. It is likely that 
measuring magnetic declination (i.e. the variation in the angle of magnetic north from true north) 
played a large part in the growth of the mean’s popularity. 


Choosing an appropriate average 

Selecting the most appropriate average to represent the values in a set of data isa 

matter for discussion in most situations. Just as it may be possible to choose an average 
that represents the data well, so it is often possible to choose an average that badly 
misrepresents the data. The purpose and motives behind choosing an average value must 
also be considered as part of the equation. 


Consider a student whose marks out of 20 in 10 tests are: 3,4,6,7,8,11,12,13,17 and i7. 
The three averages for this set of data are: mode =17, mean = 9.8 and median = 9.5. 


If the student wishes to impress their friends (or parents), they are most likely to use the 
mode as the average because it is the highest of the three. Using either the mean or median 
would suggest that, on average, the student scored fewer than half marks on these tests. 
Some of the feaiures of the measures of central tendency are given in the following table. 


N za 


Do not confuse the 
median’s position 
(150th) with its value 


We will use cumulative 
frequency graphs to 
estimate the quartiles, 
the interquartile range 
and percentiles in 
Chapter 3, Section 3.2. 


The mid-range, as 

you will discover in 
Chapter 3, Section 3.2, 
is not the same as the 
median. 


= A 
O ‘ Unlikely to be affected by extreme values. 
Moc. Useful to manufacturers that need to know the most popular styles and sizes. 
N Can be used for all sets of qualitative data. 


Ignores most values. 
Rarely used in further calculations. 


Takes all values into account. Frequently used in further calculations. 
The most commonly understood average. 
Can be used to find the sum of the data values. 


known. 


Cannot be found unless all values are 


Likely to be affected by extreme values. 


Can be found without knowing all of the values. Relatively unaffected by 
extreme values. 


| Only takes account of the order of the 
| values and so ignores most of them. 
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As an example of the effect of an extreme value, consider the dataset 40,40,79,100,130 and 
250. If we increase the largest value from 250 to 880, the mode and median are unchanged 
(i.e. 40 and 85), but the mean increases by 100% from 105 to 210. 


Although the median is usually unaffected by extreme values, this is not always the case, as 
the Libor scanda! shows. 


| LIBOR (London Interbank Offered Rates) are average 
| interest rates that the world’s leading banks charge each Í i | Í l | Í l | | l | 
other for short-term loans. They determine the prices that A B C D 
people and businesses around the world pay for loans or PED 2.8% 3.0% KG 
receive for their savings. They underpin over US $450 LIBOR = 2.9% 
trillion worth of investments and are used to assess the 

health of the world’s financial system. 


The highest and lowest 25% of the daily rates submitted by a small group of leading banks are 
discarded and a LIBOR is then fixed as the mean of the middle 50%. The above diagram shows a 
simple example. 


Consider how the LIBOR would be affected if bank D submitted a rate of 2.5% instead of 3.1%. 


Several leading banks have been found guilty of manipulating the LIBOR by submitting false rates, 
which has so far resulted in them being fined over US $9 billion. 


You can find out more about the LIBOR scandal by searching news websites. 


Consider the number of days taken by a courier company to deliver 100 packages, as given 
in the following table and represented in the bar chart. 


4 5 6 
14 8 3 
EERE EEE EEE EEE EEE EEE 
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE ELH 
H ji J ji HHH J HEH iai al ! paag! EEEH 
Ct Ps HEH HHH HHHH HE La A FE 
EHE ; PAE ERESS EEE e 
30TH at FREE -EE EERE EE EERE Ee reer Ee Ee 
e HR EE Aeee ereo Pe OSS PE OSSE SENEESE ENESES 
3 ree EHHE 
3 20 HHEH HHH 
a EEH HHH EEH E HEH HEE 
H HEE RE EEEE 
EECC a oS Te Pei E T E 
eae HHHH 
of TEE 
a | 
0 ian! ET I] 
1 2 3 4 5 6 
No. days 
A curve has been drawn over the bars to show the shape of the data. 
The mode is 2 days. 
The median is between the 50th and 51st values, which is 2.5 days. 
1x10)+(2x40)+(3x25)+(4x14)+(5x8)+(6x 
The mean = UXT La ARA JEO RUILE = 2.79 days 


100 
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The mean is the largest average and is to the side of the curve’s longer tail. 


The mode is the smallest average and is to the side of the curve’s shorter tail. 
In Chapter 3, we will 
The median is between the mode and the mean. use a measure of 
central tendency and 
a measure of variation 
to better describe the 
values in a set of data. 


A set of data that is not symmetrical is said to be skewed. When the curve’s longer tail is to 
the side of the larger values, as in the previous bar chart, the data are said to be positively 
skewed. When the longer tail is to the side of the smaller values, the data are said to be 


negatively skewed. 
In Chapter 8, we 


Generally, we find that: will study sets of 


Mode<median< mean when the data are positively skewed. data called normal 


distributions in which 
Mean<median<mode when the data are negatively skewed. the mode, mean and 


EXERCISE 2E 


1 The number of patienis treated each day by a dentist during a 20-day period is shown in the following 
stem-and-leaf diagram. 


median are equal. 


0/44456667 Key: i | 5 
1/4556677889 represents 
2/01 15 patients 


a Find the median number of patients. 


| b On eight of these 20 days, the dentist arrived late to collect their son from school. If they decide to use 
their average number of patients as a reason for arriving late, would they use the median or the mean? 
Explain your answer. 


c Describe a situation in which 1t would be to the dentist’s advantage to use a mode as the average. 


2 a Find the median for the values of t given in the following table. 


7 8 9 10 11 12 13 
4 7 9 14 16 41 | 9 


b What feature of the data suggests that 7 is less than the median? Confirm whether or not this is the case. 


3 a Find the median and the mode for the values of x given in the following table. 


4 5 6 7 8 


14 13 4 12 15 


b Give one positive and one negative aspect of using each of the median and the mode as the average value 
for x. 


c Some values in the table have been incorrectly recorded as 8 instead of 4. Find the number of incorrectly 
recorded values, given that the true median of x is 5.5. 
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120 a m 
z Cor 
3 E naura 
D H E CE 
aa. 80 -= HH oI PTE HHH 
E O 
2 a C CLAP E 
Eg HH HH Ho 
ae 
Z H m ma E 
5 AH 
O 40 H 7 THH 
| be 
mau E 
coo AEEOESOAMEEN a O O O O S S E E E 
m NYT | | [| 
9 2 4 6 8 10 
Time (min) 


a Estimate the median time taken. 
b The median is used to divide these people into two groups. Find the median time taken by each of the 


groups. 


5 The masses, m kilograms, of 148 objects are summarised in the following table. 


m<0 m<0.2 
0 16 


Construct a cumulative frequency polygon on graph paper, and use it to estimate the number of objects with 
masses that are: 


m<0.3 m<0.5 m<0.7 m<0.8 


28 120 144 148 


a within 0.1kg of the median 


b more than 200g from the median. 


6 A teacher recorded the quiz marks of eight students as 11, 13, 15, 15,17, 18,19 and 20. 
They later realised that there was a typing error, so they changed the mark of 11 to 1. 


Investigate what effect this change has on the mode, mean and median of the students’ marks. 


7 The foliowing table shows the lifetimes, to the nearest 10 days, of a certain brand of light bulb. 


| Life 90-100 110-120 130—140 150-160 170-190 200-220 230-260 
12 28 54 63 41 16 6 


a Use upper class boundaries to represent the data in a cumulative frequency graph and estimate the median 
lifetime of the light bulbs. 


b How might the manufacturer choose a value to use as the average lifetime of the light bulbs in a publicity 
campaign? Based on the figures in the table, investigate whether it would be to the manufacturer’s 
advantage to use the median or the mean. 
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8 Itisclaimed on the packaging of a brand of battery that they can run a standard kitchen clock continuously 
for ‘at least 150 days on average’. Tests are carried out to find the length of time, t hours, that a standard 
kitchen clock runs using one of these batteries. The results are shown in the following table. 


iS 


What could the words on the packaging mean? Test the claim by finding the mean, the median and the modal 
class. What conclusions, if any, can you make about the claim? 


3000 < ż < 3096 3096 St < 3576 3576 < t < 3768 3768 St < 3840 
34 66 117 33 


9 Homes ina certain neighbourhood have recently sold for $220000, $242 000, $236000 and $3500 000. 
A potential buyer wants to know the average selling price in the neighbourhood. Which of the mean, median 
or mode would be more helpful? Expiain your answer. 


10 A study was carried out on 60 electronic items to find the currents, x amperes, that could be safely passed 
through them at a fixed voltage before they overheat. The results are given in the two tables below. 


& 0.5 1.5 2.0 3.5 5.0 

n O 60 48 20 6 0 
RS 0.5 15 | 2.0 3.5 5.0 
> 0 P q r 60 


a Find the value of p, of q and ofr. 


b Cumulative frequency graphs are drawn to illustrate the data in both tables. 
| i Describe the transformation that maps one graph onto the other. 


ii Explain the significance of the point where the two graphs intersect. 


& Q 11 The lengths of extra-time, t minutes, played in the first and second halves of 100 football matches are 
summarised in the following tabie. 


t<l t<2 tx4 t<5 t<7 t<9 
24 62 80 92 97 100 
6 17 35 82 93 100 


a Explain how you know that the median extra-time played in the second halves is greater than 
in the frst halves. 


b The first-half median is exactly 100 seconds. 


i Find the upper boundary value of k, given that the second-half median is k times longer than the first- 
half median. 


ii Explain why the mean must be greater than the median for the extra-time played in the first halves. 
Ps) 12 Eighty candidates took an examination in Astronomy, for which no candidate scored more than 80%. The 
examiners suggest that five grades, A, B, C, Dand E, should be awarded to these candidates, using upper 


grade boundaries 64, 50, 36 and 26 for grades B, C, D and E, respectively. Tu this case, grades A, B, C, D and 
E, will be awarded in the ratio 1:3:5:4:3. 


a Using the examiners’ suggestion, represent the scores in a cumulative frequency polygon and use it to 
estimate the median score. 


b All of the grade boundaries are later reduced by 10%. Estimate how many candidates will be awarded a 
higher grade because of this. 
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13 The values of x shown in the following table are to be represented in a bar chart. 


a 
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6 


7 


8 


9 


10 


11 


5 


9 


10 


9 


r 


2 


i Sketch a curve that shows the shape of the data. 
ii Find the mode, mean and the median of x. 


b The two smallest values of x (i.e. 5 and 5) are changed to 21 and 31. Investigate the effect that this has on 
the mode, the mean, the median and on the shape of the curve. 


c If, instead, the two largest values of x (i.e. 11 and 11) are changed to —9 and b, so that the mean of x decreases 
by 1, find the value of b and investigate the effect that this has on the mode, the median and the shape of the 
curve. 


14 A histogram is drawn to illustrate a set of continuous data whose mean and median are equal. Make sketches 
of the different types of curve that could be drawn to represent the shape of the histogram. 


15 Students’ marks in a Biology examination are shown by percentage in the following table. 


%) 20- 30- 40- 50- 60- 70- 80-90 
AN 5 10 20 30 20 10 5 


a Without drawing an accurate histogram, describe the shape of the set of marks. What does the shape 
suggest about the values of the mean, the median and the mode? 


b Information is provided about the marks in examinations in two other subjects: 


Chemistry: mode >median> mean Physics: mean > median > mode 


Sketch a curve to show the shape of the distribution of marks in each of these exams. 


Copyright Material - Review Only - Not for Redistribution 


Cambridge International AS & A Leve}tathematics: Probability & Statistics 1 
> JO 
SS co 
© 
Checklist of leardi and understanding O 


Measures of central tendency are the mode, the mean and the median. 

For ungrouped data, the mode is the most frequently occurring value. 

For grouped data, the modal class has the highest frequency density and the greatest height 
column in a histogram. 


For ungrouped data, x = = 


= Oy D 
For grouped data, x = xf or > y 
The formulae for ungrouped and grouped coded data can be summarised by: 


x = mean(x-b)+b 
K= 1 x [mean (ax — b)+ b] 
e For ungrouped coded data: 


o 2O 
n 


=3[ 2B] 


a n 


For grouped coded data: 


X(x—b) f 
o 


a1] Hanoy 5] 


Pans +1 
For ungrouped data, the median is at the (4 th value. 


x= 


For grouped data, we estimate the median to be at the sth value on a cumulative frequency 
graph. 
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| END-OF-CHAPTER REVIEWGKERCISE 2 | REVIBWXERCISE 2 


1 For each of the following sets of data, decide whether you would expect the mean to be less than, equal to or 
greater than the median and the mode. 


a The ages of patients receiving long-term care at a hospital. [1] 
b The numters of goals scored in football matches. [1] 
c The heights of adults living in a particular city. [1] 


2 The mean mass of 13 textbooks is 875 grams, and n novels have a total mass of 13 706 grams. Find the 
mean mass of a novel, given that the textbooks and novels together have a mean mass of 716.6 grams. [3] 


3 Nine values are 7, 13, 28, 36, 13. 29, 31,13 and x. 


a Write down the name and the value of the measure of central tendency that can be found without 


knowing the value of x. [1] 
b If itis known that x is greater than 40, which other measure of central tendency can be found and 

what is its value? [1] 
c If the remaining measure of central tendency is 25, find the value of x. [2] 


4 For the data shown in the following table, x has a mean of 7.15. 


(~ 3 | 6 liolis 
| albļcl|d 


a Find the mean value of y given in the following table. [1] 


11 | 14 | 18 | 23 
a |b c ld 


b Finda calculated estimate of the mean value of z given in the following table. [2] 


2 8 14 24—34 
a b c d 


5 The table beiow shows the number of books read last month by 2 group of children. 


V BREE 
Aa 
NO. 64% 3] 8 | 15| 4 
a if the mean number of books read is exactly 3.75, find the value of q. [2] 


b Find the greatest possible value of q if: 
i the modal number of books read is 4 [1] 
ii the median number of books read is 4. [1] 


6 The following table gives the heights, to the nearest 5cm, of a group of people. 


120-135 , 140-150 | 155-160 | 165-170 | 175-185 


Given that the modal class is 140-150 cm, find the least possible value of p. [3] 
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The following histogram illustrates the 
masses, m kilograms, of the 216 sales of 
hay that a farmer made to customers last 
year 


a Show that a calculated estimate of the 
mean is equal to the median. [4} 


b Estimate the price per kilogram at which 
the hay was sold, given that these sales 
generated exactly $1944. Why is it 
possible that none of the customers 
actually paid this amount per kilogram 
for the hay? [4] 


Frequency density 


An internet service provider wants to know how 
customers rate its services. A questionnaire asks 
customers to tick one of the following boxes. 


30 40 50 60 70 80 
Mass (m kg) 


excellent 7] good [] average C] poor O very poor O 


a How might the company benefit from knowing each of the available average responses of its 
customers? [2] 


b What additional benefit could the company obtain by using the following set of tick boxes instead? 
excellent = 5 [0] good=4 [0] average = 3 [] poor =2 [0] very poor=10 B] 


The numbers of items returned to the electrical department of a store on each of 100 consecutive days are 
given in the following table. 


0 b 2 3 4 5 6-p 
49 16 10 9 7 5 4 


a Write down the median. [1] 
b Is the mode a good value to use as the average in this case? Give a reason for your answer. [1] 
c Find the value of p, given that a calculated estimate of the mean is 1.5. [3] 


d Sketch a curve that shows the shape of this set of data, and mark onto it the relative positions of 
the mode, the mean and the median. [2] 


As part of a data collection exercise, members of a certain school year group were asked how long they spent 
on their Mathematics homework during one particular week. The times are given to the nearest 0.1 hour. The 
results are displayed in the following table 


i Draw, on graph paper, a histogram to illustrate this information. [5] 


ii Calculate an estimate of the mean time spent on their Mathematics homework by members of this year 
group. [3] 
Cambridge International AS & A Level Mathematics 9709 Paper 6 Q5 June 2008 
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11 For 150 values of x, it is given that X(x —1)+ X(x — 4) = 4170. Find x. [3] 


(Ps) 12 On Monday, a teacher asked eight students to write down a number, which is denoted by x. On Tuesday, 
when one of these students was absent, they asked them to add 1 to yesterday’s number and write it down. 
Find the number written down on Monday by the student who was absent on Tuesday, given that 
x= 304, and that the mean of Monday’s and Tuesday’s numbers combined was 27 4. [3] 


13 A delivery of 150 boxes, each containing 29 items, is made to a retailer. The numbers of damaged items in the 
boxes are shown in the following table. 


0 1 2 3 4 5 6 or more 
100 10 10 10 10 10 0 


a Find the mode, the mean and the median number of damaged items. [3] 


b Which of the three measures of central tendency would be the mosi appropriate to use as the average 
in this case? Explain why using the other two measures could be misleading. [2] 


14 The monthly salaries, w dollars, of 10 women are such that &(w— 3000) = —200. 
The monthly salaries, m dollars, of 20 men are such that Z(m — 4000) = 120. 


a Find the difference between the mean monthly salary of the women and the mean monthly salary of 
the men. [3] 


b Find the mean monthly salary of all the women and men together. [3] 


15 For 90 values of x and 64 values of y, it is given that X(x -— 1) = 72.9 and X(y +1) = 201.6. Find the mean 
value of all the values of x and y combined. [5] 
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In this chapter you will learn how to: 


find and use different ineasures of variation 


use a cumulative frequency graph to estimate medians, quartiles and percentiles 

calculate and use the standard deviation of a set of data (including grouped data) either from the 
data itself or from given totals £x and Ex’, or coded totals Z(x-— b) and X(x—b)* and use such 
totals in solving problems that may involve up to two dataseis. 
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PREREQUISITE KNOWLEQGE 
+ ud 
O 


Where it comes from What you should be able to do Check your skills 


IGCSE / O Level Mathematics | Accurately label and read from | 1 The numbers 2 and 18 are marked on an 
an axis, using a given scale. axis 20cm apart. How far apart are the 
numbers 4.5 and 17.3 on this axis? 


Substitute into and manipulate 2 bY 
: aa x oe 
algebraic formulae containing If y= a. (2) , find the positive 


squares and square roots. 
q q value of: 


a ywhen x=13,a=4 and b= 4352 
b xwhen y=12,a=5 and b=11. 


How do we best summarise a set of data? 

A measure of central tendency alone does not describe or summarise a set of data fully. 
Although it may tel us the location of the more central values or the most common values, 
it tells us nothing about how widely spread out the values are. Two sets of data can have the 
same mean, median or mode, yet they can be completely different. A better description of a 
set of data is given by a measure of central tendency and a measure of variation. Variation is 
also known as spread or dispersion. 


Consider the runs scored by two batters in their past eight cricket matches, which are given 
in the following table. 


24 | Total: 224 
Total: 224 


The mean number of runs scored by A and by B is the same; namely, 224 + 8 = 28. However, 
the patterns of the number of runs are clearly very different. The numbers for batter A are 
quite consistent, whereas the numbers for batter B are quite varied. This consistency (or lack 
of it) can be indicated by a measure of variation, which shows how spread out a set of data 
values are. 


Three common!y used measures of variation are the range, interquartile range and 
standard deviation. 


3.1 The range 

As you will recall, the range is the numerical difference between the largest and smallest 
values in a set of data. One advantage of using the range is that it is easy to calculate. 
However, it does not take the more central values into account but uses only the most 
extreme values. It is often more informative to state the minimum and maximum values 
rather than the difference between them. 


For example, in a test for which the lowest mark is 6 and the highest mark is 19, the range 
is 19-6=13. 


For grouped data, we can find a minimum and maximum possible range, using the lower and 
upper boundary values of the data. 
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WORKED EXAMPLE 3.1 


To the nearest centimetre, the tallest and shortest pupils in a class are 169cm and 150cm. 


Find the least and greatest possible range of the students’ heights. 


Answer 


3.2 The interquartile range and percentiles 
The lower quartile, median and upper quartile, as you will recall, divide the values in a 
dataset into four parts, with an equal number of values in each part. 


These three measures are commonly abbreviated by: 


e Q, for the lower quartile 

e Q, for the median (or middle quartile) 

e Q; for the upper quartile. 

The interquartile range is the numerical difference between the upper quartile and the 


lower quartile, and gives the range of the middle half (50%) of the values, as shown in the 
following diagram. 


<— interquartile range —> 


nn S a E 
smallest Qı Q2 Q3 largest 
value value 


The interquartile range is often preferred to the range because it gives a measure of how 
varied the more central values are. It is relatively unaffected by extreme values, also called 
outliers, and can be found even when the exact values of these are not known. 


Ungrouped data 
The positions of the lower and upper quartiles depend on whether there are an odd or even 
number of values in the set of data. One method that we can use to find the quartiles is as follows. 


For an even number of ordered values: we split the data into a lower half and an upper 
half. Then Q and Q; are the medians of the lower half and upper half, respectively. 


For an odd number of ordered values: we split the data into a lower half and an upper half 
at the median, which we then discard. Again, Q} and Q; are the medians of the lower half 
and upper half, respectively. 
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Least possible range = 168.5 — 150.5 The intervals in which the given heights, A, lie are 
=18cm 168.5 < A < 169.5cm and 149.5 < h < 150.5cm. 
Greatest possible range = 169.5 — 149.5 
= 20cm 


In a set of ungrouped 
data, the median is 


always at the ( fe > 1 jen 


value. However, it is 
advisable to find the 
quartiles by inspection 
rather than by 
memorising formulae 
for their positions. 
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WORKED EXAMPLE 3. 


Find the interquartile range of the eight ordered values 2, 5, 9, 13, 29, 33, 49 and 55. 


Answer 
Ist 2nd 3rd 4th 5th 6th 7th 8th 
2 5 1 9 13 1 29 33 1 49 55 
Qı Q2 Q3 
| IQR =Q; - Q 
_ 33+49 E 5+9 
2 2 
=41-7 
=34 


WORKED EXAMPLE 3 


Find the interquartile range of the seven values 69, 17, 43, 6, 73, 77 and 39. 


Answer 
lst 2nd 3rd 4th 5th 6th Tth 
6 17 39 43 69 73 77 
Qı Q2 Q3 
| iQR=0;-Q, 
= 73-17 
= 56 


WORKED EXAMPLE 3.4 


aa 
N 


4) REWIND 


Find the interquartile range of the 13 grouped values shown in the foliowing 
stem-and-leaf diagram. We studied stem- 


and-leaf diagrams in 


14/22 489  Key:14 |2 f 

15| 1[3]5 6 7°9 represents 142 Chapter 1, Section 1.2. 
16| 5 8 

Answer 


13+1 


l Q is at the ( jen = 7th value, eeoccoe 


which is |153]. 


144+ 148 


o = RR = 158 eveceeeceeecece 
IQR = 158-146 
=12 
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In this activity, you will investigate the value of the median in relation to the 
smallest and largest values in a set of data, and also in relation to the lower and 
upper quartiles. 


For each ordered set of data, A to D, write down these five values: the smallest value; 
the lower quartile; the median; the upper quartile; and the largest value. 


Set A: 2, 2, 3, 11, 11, 21, 22. 

Set B: 6, 6, 6, 11, 13, 17, 19, 20. 

Set C: 9, 15, 28, 32, 35, 49. 

Set D: 5, 7, 9, 10, 11, 12, 12, 16, 17. 

It may be useful to mark the five vaiues for each dataset on a number line. 


Use your results to decide which of the following statements are always, sometimes 
or never true. 


The median is mid-way between the smallest and largest values. 
The median is mid-way between the lower and upper quartiles. 
The interquartile range is equal to exactly half of the range. 


Q-Q >Q -Q 


Grouped data a © REWIND 

We can use a cumulative frequency graph to estimate values in any position in a 

set of data. This includes the lower quartile, the upper quartile and any chosen We estimated the 
percentile. median from a 


cumulative frequency 


graph in Chapter 1, 


a Section 1.4. 


For grouped data with total trequency n = Èf, the positions of the quartiles are shown in the 
following table. 


CP lower (Qi) median (Qs) upper (Q) 


n 


4 


1 n 1 
-y ~ ord 
or] f 5 oS f 


The nth percentile is the value that is n% of the way through a set of data. 
Q,, Q» and Q; are the 25th, 50th and 75th percentiles, respectively. 


In an ordered dataset with, say, 320 values, Q}, Q, and Q; are at the 80th, 160th and 
240th values, and the 90th percentile is at the (0.90 x320)=288th value. 


The range of the middle 80% of a dataset is the difference between the 10th and 
90th percentiles. 
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WORKED EXAMPLE 3.5 


e~ Ss a> 


The following graph illustrates the times, in minutes, taken by 500 people to complete a task. Use the graph 
to find an estimate of: 


a the greatest possible range b the interquartile range c the 95th percentile. 
500 f 
475 BEES SS SSSR ERR SSS EEE REE ee 
{Pp eee Flow | | tt tt 
NT i 
nec 
400 rT i 
R 375 Aa J 
[e] = I 
a ro 
fe} | ! 
é g ve 
z 30047 i Leo re 
Bf : | 3 
Z 250 ! a : 
eS ` ji 1 
E i i 
Z i ' 
3 200 | 
5 ERG 
K EA I 
| i i 
125 þ>++447----H Fes 
100 BE i z 
Ali i i 
lon) ia 
“al T I 1% 
: 5 i 10 115 20 1 25 30 
v y Y 
Q1 Q3 95th percentile 
Time (min) 
Answer 
a 30-2=25min The greatest possible range is equal to the width of the polygon. 
b QO,~8.0min ccece iS 
Q; ~14.5min 
IQR =Q;-Q, 
=14.5-8.0 
=6.5min (7 


c =240min +++» { The Osti percentile is ar the (095% 900) =47sth tue 
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Box-and-whisker diagrams 

A box-and-whisker diagram (or box plot) is a graphical representation of data, showing 
some of its key features. These features are its smallest and largest values, its lower and 
upper quartiles, and its median. 


If drawn by hand, the diagram is best drawn on graph paper and must include a scale. 


It takes the form shown in the following diagram, which shows some features of a dataset 
denoted by x. 


smallest | Coo 
value == 


<—_-——- box —————> 


<<. sSwhisker —_———mmm 


Key features of the data for x represented in the box-and-whisker diagram are: 
Median: Q, =6 

Range = 14—1=13 

IQR =Q,-Q,=11-4=7 

The following box-and-whisker diagram is a representation of a dataset denoted by y, 
drawn using the same scale as the previous diagram. 


| 


The following table shows a measure of central tendency and two measures of variation for 
each of x and y, and we can use these to make comparisons. 


By comparing medians, values of x are, on average, less than values of y. 
By comparing ranges and interquartile ranges, values of y are more varied than values of x. 


We can assess the skewness of a set of data using the quartiles in a box-and-whisker 
diagram. 


In the previous box-and-whisker diagram for x, Q; — Q, > Q» —Q,, and the longer tail of the 
curve drawn over a bar chart would be to the side of the larger values. This means that the 
data for x is positively skewed. 


In the previous box-and-whisker diagram for y, Q» — Q, > Q; — Q, and the longer tail of the 


curve would be to the side of the smaller values. This means that the data for y is negatively 


skewed. 


A reasonably symmetrical set of data would have Q; -Q> = Q» - Q. 
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The whisker (which 
shows the range) is 
not drawn through the 
box (which shows the 
interquartile range). 


Items and, where 
appropriate, units 
such as ‘Length (cm)’ 
and ‘Mass (kg)’ must 
be indicated on the 
diagram. 


We looked briefly 
at positively and 


negatively skewed sets 
of data in Chapter 2, 
Section 2.3. 


EXERCISE 3A 


1 


Chapter 3: Measures of variation 


Find the range and the interquartile range of the following sets of data. 
a 5,8, 13,17, 22, 25, 30 

b 7, 13, 21, 2, 37, 28, 17, 11, 2 

c 42, 47, 39, 51, 73, 18, 83, 29, 41, 64 

d 113, 97, 36, 81, 49, 41, 20, 66, 28, 32, 17, 107 

e 4.6, 0, —2.6, 0.8, -1.9, —3.3, 5.2, -3.2 


a Find the range and the interquartile range of the dataset represented in the following box plot. 


b What tvpe of skewness would you expect this set of data to have? 


The foilowing stem-and-leaf diagram shows the marks out of 50 obtained by 15 students in a Science test. 


o|o Key: 2|5 
1/4 represents a 
i O mark of 25 
413568 out of 50 
5|00 


a Find the range and interquartile range of the marks. 
b Illustrate the data in a box-and-whisker diagram on graph paper and include a scale. 


c For this set of data, express Q; in terms of Q) and Q». 


The numbers of fouis made in eight hockey matches and in eight football matches played at the weekend are 
shown in the following back-to-back stem-and-leaf diagram. 


Hockey (8) |__| Football (8) Key: 6 | 1|8 
represents 16 fouls 

99 in hockey and 18 

233 fouls in football 


= oo 


a Isit true to say that the numbers of fouls in the two sports are equally varied? Explain your answer. 

b Draw two box-and-whisker diagrams using the same scales. Write a sentence to compare the numbers of 
fouls committed in the two sports. 

Rishi and Daisy take the same seven tests in Mathematics, and both students’ marks improve on successive 

tests. Their percentage marks are as follows. 

Rishi’s marks Daisy’s marks 


15 24 28 33 39 42 50 51 65 69 72 78 83 86 


a Explain why it would not be useful to use the range or the interquartile range alone as measures for 
comparing the marks of the two students. 


b Name two measures that could be used together to give a meaningful comparison of the two students’ 
marks. 
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6 The following table shows the maximum speeds, s km/h, of 
some vintage cars. 


a On the same sheet of graph paper, construct a cumulative s<35 0 
frequency polygon and a box-and-whisker diagram to 35<5<40 20 
illustrate the data. 

40 <=s<45 65 

b Use your box-and-whisker diagram to assess what type of 45<5<50 10 

skewness the data have. 
50=s5<55 27 
7 Twenty adults are selected at random, and each is asked to 55<s<70 13 

state the number of trips abroad that they have made. The RF ; 

results are shown in the following back-to-back stem-and-leaf mss 

diagram. 

a i Draw box-and-whisker diagrams, using the same scales for 

males and for females. _ Males (9)| | Females (11) Key: 3|1|1 
832000/0/345 represents 13 
ii Interpret the key features of the data represented in your 53/1/11236 _ trips foramale 
diagrams and compare the data for the two groups of adu!ts Sens ang it wnt? 
8 N Comp LOUP! — 913 a female 
b Inasuminary of the data, a student writes, ‘The females have 
visited more countries than the males.’ Is this statement justified? 
Give a reason to support your answer. 
8 The resistances, in ohms(Q), of 100 conductors are represented in the following graph. 
100 

& 80 

3 

E 

g 

) 

Z 60 

al 

3 

3 

3 

& 

2 40 

3 

j=] 

g 

=] 

$) 

20 
0 
Resistance (Q) 

Find, to an appropriate degree of accuracy, an estimate of: 

a the interquartile range b the 90th percentile 

c the percentile that is equal to 0.192 Q d the range of the middle 40% of the resistances. 
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200 


150 


100 


Cumulative frequency 
(No. circuit boards) 


50 
8 16 24 32 40 48 56 
Area (cm?) 
a State ihe greatest possible range of the data. 
b Construct a box-and-whisker diagram to illustrate the data. 
c Find the range of the middle 60% of the areas. 


d An outlier is an extreme value that is more than 1.5 times the interquartile range above the upper quartile 
or more than 1.5 times the interquartile range below the lower quartile. Find the areas that define the 
outliers in this set of data, and estimate how many there are. How accurate is your answer? 


A company manufactures right-angled brackets for use in the 
construction industry. A samp!e of brackets are measured, and | Deviation from 90° (d) | No. brackets (/) | 


the number of degrees by which their angles deviate from a right d<-1.5 0 
angle are summarised in the following table. 
-1.5<d<-tl.0 24 
a Draw a cumulative frequency polygon to illustrate these 
we -10<d<-0.5 46 
deviations. 
b Estimate the median and the interquartile range of the ieee g 
bracket angles, giving both answers correct to 1 decimal 0.0<d<0.5 34 
la >e. 
ys 0.5<d<1.0 34 
c A bracket is considered unsuitable for use if iis angle TERT m 
deviates from a right angle by more than 1.2". Estimate what eet ‘ 
percentage of this sample is unsuitable for use, giving your 1.5<d<2.5 17 


answer correct to the nearest integer. 


The following table shows the cumulative frequencies for values of x. 


<0 <10 <15 <25 <30 <40 
0 12 30 90 102 120 


Without drawing a cumulative frequency graph, find: 
a the interquartile range 


b the 85th percentile. 
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(M) 12 Fifty 10-gram samples of a particular type of mushroom are collected by volunteers at a university and tested. 
The following table shows the mass of toxins, in hundredths of a gram, in these samples. 


0- 4— 11- 17- 20-30 


2 19 23 3 3 


a Draw a cumulative frequency curve to illustrate the data. 
b Use your curve to estimate, correct to 2 decimal places: 

i the interquartile range 

ii the range of the middle 80%. 


c It was found that toxins made up between 0.75% and 2.25% of the mass of n of these samples. Use your 
curve to estimate the value of i. 


d Make an assessment of the variation in the percentage of toxic materiai in these samples. Can you suggest 
any possible reasons for such variation? 


(M) 13 A 9-year study was carried out on the pollutants released when biomass fuels are used for cooking. 
Researchers offered nearly 1000 people living in 12 villages in southern China access to clean biogas and 
to improved kitchen ventilation. Some people took advantage of neither; some changed to clean fuels; 
some improved their kitchen ventilation; and some did both. The following diagram shows data on the 
concentrations of nitrogen dioxide in these people’s homes at the end of the study. 


Groups: E Neither O Clean fuels only Ventilation only El Both 


Nitrogen dioxide pollutant concentration (mg/m?) 
0 0.25 0.5 0.75 


Study the data represented in the diagram and then write a brief analysis that summarises the results of this 
part of the study. 


The study of human physical growth, auxology, is a 
multidisciplinary science involving genetics, health sciences, 
sociology and economics, among others. 


Exceptional height variation in populations that share 

a genetic background and environmental factors is 
sometimes due to dwarfism or gigantism, which are medical 
conditions caused by specific genes or abnormalities in the 
production of hormones. In regions of poverty or warfare, 
environmental factors, such as chronic malnutrition during 
childhood, may result in delayed growth and/or significant 
reductions in adult stature even without the presence of these 
medical conditions. 


At the time of their meeting in London in 2014, Chandra 
Bahadur Dangi (at 54.6 cm) and Sultan Kosen (at 
254.3 cm) were the shortest and tallest adults in the world. 
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The interquartile range is based around the median. In this exploration, we investigate 
a possible way to define variation based on the mean. 


Choose a set of five numbers with a mean of 10. 


The deviation of a number tells us how far and to which side of the mean it is. Numbers 
greater than the mean have a positive deviation, whereas numbers less than the mean 
have a negative deviation, as indicated in the foliowing diagram. 
mean 
a 
negative deviation i positive deviation 


Find the deviation of each of your fve numbers and then calculate the mean 
deviation. Compare and discuss your results, and investigate other sets of numbers. 


Can you predict what the result will be for any set of five numbers with a mean of 10? 
Can you justify your prediction? What would you expect to happen if you started 
with any set of five numbers? 


3.3 Variance and standard deviation 

In the Explore 3.2 activity, you discovered that the mean deviation is not a useful way of 
measuring the variation of a dataset because the positive and negative deviations cancel |x- x| means we 
each other out. So, if we want a measure of variation around the mean, we need to ensure calculate x—X and 


that each deviation is positive or zero. remove the minus 
sign if the answer is 


We can do this by calculating the mean distance of the data values from the mean, which ñegalive 


Z|x- x| 


we call the ‘mean absolute deviation from the mean’, 


n 
However, it is hard to calculate this accurately or efficiently for large sets of data and it is 
difficult to work with algebraically, so this approach is not used in practice. 


Alternatively, we can calculate the squared deviation, (x—X)* for all data values and find 
their mean. This is the ‘mean squared deviation from the mean’, which we cali the variance 
of the data. 
=i 

Xx(x-x 
Var(x)= Xxx) 
For measurements and deviations in metres, say, the variance is in m7. So, to get a measure 
of variation that is also in metres, we take the square root of the variance, which we call the 
standard deviation. 


=z 
Standard deviation of x = „/Var(x) = Zœ-¥) g 


n 
This looks no easier to calculate than the ‘mean absolute deviation from the mean’, 


however, the formula for variance can be simplified (see appendix at the end of this 
chapter) to give: 


To find =x7, we add 

up the squares of the 

data values. A common 

wa Ex? ( mx i error is to add up the 
data values and then 


2 
va 
n 


n n 


square the answer but 


: . ; this would be written 
number of values, their sum and the sum of their squares, respectively. We often use the as (Zx)° instead. 


abbreviation SD(X) to represent the standard deviation of X. 


We can find the variance and standard deviation from n, =x and Xx’, which are the 
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To find each value of x? f (which is the same as fx?), we can either multiply x? by f or we can 
multiply x by xf. If we multiply x by fand then square the answer, we wil! obtain (xf)? = x7 f?, 
which is not required 


A low standard deviation indicates that most values are close to the mean, whereas a 
high standard deviation indicates that the values are widely spread out from the mean. 


Consider a drinks machine that is supposed to dispense 400 ml of coffee per cup. We would 
expect some variation in the amount dispensed, yet if the standard deviation is high then some 
customers are likely to feel cheated and some risk being injured because of their overflowing cups! 


© KEY POINT 3.3 © ox 


For ungrouped data: We can remember the 


E l ea =o sy formula for variance as 
Standard deviation = Variance = ,/————— . = ,] ——-x*, where ¥ = —. mean of the squares 
n n n 


minus square of the 
For grouped data: i 


mean’. 
_=)2 D 
Standard deviation = y Variance = ee = = —x~, where x= = 
T WORKED EXAMPLE 3.6 


l] 
For the set of five numbers 3, 9,15, 24 and 29, find: 


M 
= 


a the standard deviation 


b which of the five numbers are more than one standard deviation from the mean. 


Answer 
Ix (xý \ 
a Variance =- xy Ga We subtract the square of the mean 
. is from the mean of the squares to find 
3° +9? +15? +24? + 297 [==] the variance. 
5 5 
in ( 80 j 
3 5 
= 346.4-16 
| =90.4 We take the square root of the variance 
Standard deviation = y 90.4 to find the standard deviation, correct 
=9.51 to 3 significant figures. 
b 16-9.51= 6.49 We find the values that are 9.51 below 


and 9.51 above, using the mean of 16. 
16+9.51= 25.51 


The numbers are 3 and 29. identify which of the five numbers are 
outside the range 6.49 < number < 25.51. 
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WORKED EXAMPLE 3.7, 


~ ss Se 


Find the standard deviation of the values of x given in the following table, correct 
to 3 significant figures. 


2 13 

14 28 

16 10 
Answer 


12 13 156 12 x156=1872 
14 28 392 14 x 392 = 5488 
16 10 160 16x160 = 2560 


Xf =51_ AA = 708 xx? f =9920 Note how the values of 


x? f are calculated. 


SD(x)= atx eeoeeeeeeeeeesoeece 


7 
9920 = (>) Always use the exact 
| 51 51 


value of the mean to 
calculate variance and 
standard deviation. 


What happens if we use a rounded value for the mean? 
Correct to 1 decimal place. the mean in Worked example 3.7 is 708 = 51=13.9. 


If we use ¥ = 13.9 in our calculation, we obtain SD(x) = ue 13.9? =1.14. This is an 


error of 0.2. a 
The rounded mean has caused a substantial error (0.2 1s about 15% of the correct 


value 1.34). So, when calculating the variance or standard deviation, always use = or 
Exf 
xf 


, rather than a rounded value for the mean. 


When data are grouped, actual values cannot be seen, but we can calculate estimates of the 
variance and standard deviation. The formulae in Key point 3.3 are used to do this, where x 


inf 


now represents class mid-values and x = or is a calculated estimate of the mean. 
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WORKED EXAMPLE 3.8 


1.2- 1.4— 1.5-1.7 
2 12 6 


1.2— 14- 15-14 - Weextend the frequency table to 
2 12 l- If =20 include class mid-values (x), and to 
T mpr > find the totals Uf, xf and Ex f- as 
` ` A shown in the table opposite. 
2.6 17.4 9.6 2af = 29,6 . 


3.38 2523 15.36 Ex’ f =43.97 


Estimate of standard deviation = 


1 


Four students analysed data that they had collected. Their findings are given below 


1 Property prices in a certain area of town have a high standard deviation. 
2 The variance of the monthly sales of a particular product last year was high. 


3 The standard deviation of students’ marks in a particular examination was 


close to zero. 
4 The times taken to perform a new medical procedure have a low variance. 


Discuss ihe students’ findings and give a possible description of each of the following. 


1 The type of environment and the people purchasing property in this area 
of town. 


2 The type of product being sold. 
3 The usefulness of the examination. 


4 The efficiency of the teams performing the medical procedures. 
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Although standard deviation is far more commonly used as a measure of variation than 
the interquartile range, it may not always be ideal because it can be significantly affected 
by extreme values. The interquartile range may be better, and a box-and-whisker diagram 
is often much more useful as a visual representation of data than the mean and standard 
deviation. 


Some features of the standard deviation are compared to the interquartile range in the 
following tabie. 


Much simpler to calculate than the IQR. Far more affected by extreme values than the IQR. 


Data values do not have to be ordered. Gives greater emphasis to large deviations than to small deviations. 


Easier to work with algebraically when doing more advanced 
work. 


Takes account of all data values. 


EXERCISE 3B 


1 Find the mean and the standard deviation for these sets of numbers. 
a 27,43, 29, 34, 53, 37,19 and 58. 
b 6.2,-8.5, 7.7, - 4.3, 13.5 and -11.9. 


2 Last term Abraham sat three tests in each of his science subjects. His raw percentage marks for the tests, in 
the order they were completed, are listed. 


21 33 45 41 53 65 51 63 75 


a Calculate the variance of Abraham’s marks in each of the three subjects. 


b Comment on the three values obtained in part a. Do the same comments apply to Abraham’s mean mark 
for the tests in the three subjects? Justify your answer. 


3 The following table shows the number of pets owned by each of 35 families. 


Find the mean and variance of the number of peis. 


4 The numbers of cobs produced by 360 maize plants are shown in the following table. 


11 75 185 81 8 


a Calculate the mean and the standard deviation. 


b Find the interquartile range and give an example of what it tells us about this dataset that the standard 
deviation does not tell us. 
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5 The times spent, in minutes, by 30 girls and by 40 boys on an assignment are detailed in the following table. 


SB o- 


AÀ 


a For the boys and for the girls, calculate estimates of the mean and standard deviation. 
b Itis required to make a comparison between the times spent by the two groups. 
i What do the means tell us about the times spent? 


ii Use the standard deviations to compare the times spent by the two groups. 


6 The lengths, correct to the nearest centimetre, of 50 rods are given in the following table. 


15-17 18—24 25-29 30-37 


Calculate an estimate of the standard deviation of the lengths. 


7 For the dataset denoted by x in the following table, k is a constant. 
C 15 16 17 18 19 20 
2k k+5 k-3 10 8 3 


| Vind the value of & and calculate the variance of x, given that x =17. 
70 


8 The following table illustrates the heights, in centimetres, of 150 children. 


140 up to 144 a 
144 up to 150 b 
150 up to 160 69 
160 up to 165 28 


a Given that a calculated estimate of the mean height is exactiy 153.14cm, show that 142a + 147b = 7726, and 
evaluate a and b. 


b Calculate an estimate of the standard deviation of the heights. 
Ps) 9 Kristina plans to raise money for charity. Her plan is to walk 217 km in 7 days so that she walks 


k +2n—n? km on the nth day. Find the standard deviation of the daily distances she plans to walk, and 
compare this with the interquartile range 


Ps) 10 The mass of waste produced by a school during its three 13-week terms is given in tonnes, correct to 2 decimal 
places, in the following table. 


0.15—0.29 0.30—0.86 0.87-1.35 1.36-2.00 
5 8 20 6 


f= 


a Calculate estimates of the mean and standard deviation of the mass of waste produced per week, giving 
both answers correct to 2 decimal places 


b No waste ts produced in the 13 weeks of the year that the school is closed. If this additional 
data is included in the calculations, what effect does it have on the mean and on the standard deviation? 
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(Ps) 11 The ages, in whole numbers of years, of a hotel’s 50 staff are given in the following table. Calculated estimates 
of the mean and variance are 37.32 and 69.1176, respectively. 


NS 23-30 31-37 38—45 46-59 
Re) 
str NG 14 x y 6 


Exactiy 1 year after these calculations were made, Gudrun became the 51st staff member and the mean age 
became exactly 38 years. Find Gudrun’s age on the day of her recruitment, and determine what effect this had 
on the variance of the staff’s ages. What assumptions must be made to justify your answers? 


(Ps) CPN 12 Refer to the following diagram. In position 1, a 10-metre rod is placed 10 metres from a fixed point, P. Six 
small discs, A to F, are evenly spaced along the length of the rod. The rod is rotated anti-clockwise about its 
centre by œ =30° to position 2. The distances from P to the discs are denoted by x. 


Position 1 Position 2 
A 
rod QB 
C 


- 


10 


10 


ù DdD a d 


F 


a What effect does the 30° rotation have on values of x? Investigate this by first considering the effect on the 
average distance from P to the discs. 


b Find two values that can be used as measures of the change in the variation of x caused by the rotation. 


c Use the values obtained in parts a and b to summarise the changes in the distances from P to the discs 
caused by the rotation. 


d Can you prove that =x? is constant for all values of œ? (Hint: to do this, you need only to show that £x? is 
constant for 0° < œ < 90°). 


EXPLORE 3.4 


Twenty adults completed as many laps of a running track as they could manage in 
30 minutes. The following table shows how many laps they completed. 


NJ 4-8 9-13 14-18 


Noa 6 10 4 


Two students, Andrea and Billie, were asked to calculate an estimate of the standard 
deviation. Their working and answers, which you should check carefully, are shown below. 


6* x6)+(11°x10)+(167 x4 
Andrea: | ) ( ) ( a ) = 10.52 =35 
20 
_ |(652x6)4(11.52x10)+(16.52x4) 
Billie: 20 1% =3.5 
Compare Andrea and Billie’s approaches. What have they done differently and why do 
you think they did so? Ts one of their answers better than the other? If so, in what way? 
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Calculating from totals 


For ungrouped data, we calculate variance (Var) and standard deviation (SD) from totals n, 
Ex and =x, 


For grouped data, we calculate using totals Ef, Exfand Xx? f. 


In both cases, we can rearrange the formula for variance if we wish to evaluate one of the totals. 


EXAMPLE 3.9 


Given that n= 25, x =275 and Var(x)=7, find Ex’. 


Answer 
9 2 
= = (=) =7 Substitute the given values into the formula 
175\2 for variance, then rearrange the terms to 
2 2| . 
Zx =25x [7 +( CN, | make £x? the subject. 


=3200 


Combined sets ot data 

In Chapter 2, Section 2.2, sets of data were combined by simpiy considering all of their 
values together, and we learned how to find the mean. Here, we consider the variation of 
datasets that have been combined in the same way. 


The variance and standard deviation of a combined dataset are calculated from its totals, 
which are the sums of the totals of the two sets from which it has been made. 


The two sets {1, 2, 3,4} and {4, 5,6} individvally have variances of 14 and Z The combined 
set {1, 2, 3, 4, 4, 5,6} has a variance of approximately 2.53. 


WORKED EXAMPLE 3.10 


The heights, xcm, of 19 boys are summarised by Lx =1650 and Lx? = 275490. 

The heights, ycm, of 15 girls are summarised by Ey = 2370 and Xy’ =377835. 

Calculate, to 3 significant figures, the standard deviation of the heights of all 25 children together. 
Answer 


Ex? + Ly? = 275490 + 377835 = 653325 For tke 25 children, we find the sum of the squares of their 


Sx + Ey =1650 + 2370 = 4020 heights and the sum of their heights. 


E E E / 653 325 í 4020 i We substitute the three sums into the formula for standard 
25 254 deviation and evaluate this to the required degree of accuracy. 
= 16.6cm 


: : ; r(x) + V: 
The variance of two combined datasets x and y is not (in general) equal to Va: œ) + Var(y) 


324+225 
a 


In Worked example 3.10, Var(boys) = 324 and Var(girls) = 225 but Var(boys and girls) # 
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If two sets of data, denoted by x and y, have n, and n, values, respectively, then the mean 
and variance of their combined values are found using the totals (ny +n,). (Zx+ Zy) and 
(Zx? + Dy’). 


We can rearrange the formulae in Key point 3.4 if we wish to find one of the totals involved. 


WORKED EXAMPLE 3.11 
ge > 


In an examination, the percentage marks of the 120 boys are denoted by x, and the percentage marks of the 80 
girls are denoted by y. 
The marks are summarised by the totals Ex = 7020, £x? = 424 320 and Ly? = 352 130. 


Calculate the girls’ mean mark, given that the standard deviation for all these students is 10. 


Answer 
424320+352130 (7020+ Xy ai EE, CET: 
120+80 120+80 
(7020+ £y} _ 
3882.25-— 70000 = 100 


2 


155290000- (7020+ 2y) =4000000 ss" ee eee eee 
(7020 + £y)" = 151290000 


zy = ,{151290000 — 7020 ~***°* 


Ly = 5280 


5280 D 
Girls’ i k= = 66 eeeeeeeeeeeecees 
vadi ü a 


EXERCISE 3C 


1 Given that: 
Ev? = 5480, Xv =288 and n= 64, find the variance of v. 


a 

b Ew? =4000, w=5.2 and n= 36, find the standard deviation of w. 

c <x?f =6120, Xf =40 and the standard deviation of x is 12, find Exf. 
d Xxf =2800, Lf =50 and the variance of x is 100, find 2x7/. 


e Er? =193144, Y= 2324 and that the standard deviation of t is 3, find the 
nuinber of data values of t. 
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BPs) 


10 


A building is occupied by n companies. The number of people employed by 
these companies 1s denoted by x. Find the mean number of employees, given that 
Ex? = 8900, Xx = 220 and that the standard deviation of x is 18. 


Twenty-five values of p are such that Zp? = 6006 and Ep = 388, and 25 values of 


q are such that £q? = 6114 and £q = 387. Calculate the variance of the 50 values 
of p and q together. 


In a class of 30 students, the mean mass of the 14 boys is 63.5kg and the mean 
mass of the girls is 57.3kg. Calculate the mean and standard deviation of the 
masses of all the students together, given that the sums of the squares of the 
masses of the boys and girls are 58444 kg” and 56222 kg?, respectively. 


The following table shows the froni tyre pressure, in psi, of five 4-wheeled 
vehicles, A to E. 


24 27 31 30 26 


a Show that the variance of the pressure in all of these front tyres is 7.65 psi. 


b Rear tyre pressures for these five vehicles are denoted by x. Given that £x? = 7946 
and that the variance of the pressures in all of the front and rear tyres on these 
five vehicles together is 31.6275 psi’, find the mean pressure in all the rear tyres. 


The totals Ex? = 7931, Ex =397 and Ly=499 are given by 29 values of x and n 
values of y. All the values of x and y together have a variance of 52. 


a Express Ly” in terms of n. 


b Find the value of n for which Zy? — Xx? = 10. 

The five values in a dataset have a sum of 250 and standard deviation of 15 
A sixth value is added to the dataset, such that the mean is now 40. Find the 
variance of the six values in the dataset. 


A group of 10 triends played a mini-golf competition. Eight of the friends tied 
for second place, each with a score of 34, and the other two friends tied for first 
place. Find the winning score, given that the standard deviation of the scores of 
all 10 friends was 1.2 and that the lowest score in golf wins. 


An author has written 15 children’s books. The first eight books that she wrote 
contained between 240 and 250 pages each. The next six books contained 
between 180 and 190 pages each. Correct to | decimal place, the standard 


deviation of the number of pages in the 15 books together is 31.2. 
FAST FORWARD 


Show that it is not possible to determine a specific calculated estimate of the 


number of pages in the author’s 15th book. We will see how 


standard deviation 
A set of n pieces of data has mean X and standard deviation S. and probabilities 


2 — aai linked in th 
Another set of 27 pieces of data has mean X and standard deviation + S. PEA 


normal distribution in 
Find the standard deviation of all these pieces of data together in terms of S. Chapter 8, Section 8.2. 
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> 


Note that Buti’s marks are consistently 1 less than Amber’s and that Chen’s marks are 
consistently 3 more than Amber’s. This is indicated in the last column of the table. 


For each student, calculate the variance and standard deviation. 


Can you explain your results, and do they apply equally to the range and 
interquartile range? 


Coded data 

What effect does addition of a constant to all the values in a datasei have on its variation? 

And how can we find the variance and standard deviation of the original data from the We saw in Chapter 2, 
coded data? Section 2.2 that the 


ae : f a set of dat 
In the Explore 3.5 activity, you discovered that the datasets x, x—1 and x+3 have ee 


identica! measures of variation. The effect of adding —i or +3 is to translate the whole 
set of values, which has no effect on the pattern of spread, as shown in the following 
diagram. The marks of the three students have the same variance and the same standard 
deviation. 


can be found from a 
coded total such as 
x(x —b). 


WY LLI 
(9) KEY POINT 3. \ ~ < 


BE) ea J4] 
n 


n n 


For ungrouped data: 


For grouped data: 


BE ee Ja 
z xf = 


These formulae can be summarised by writing Var(x)= Var(x — b). 
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For two datasets coded as (x — w) and (x — b), we can use the coded totals X(x — a), X(x- a}, 
X(y—b) and X(y—b) to find Ex , £x?, Ly and Ly’, from which we can find the variance of 
the combined set of values of x and y. 


Eight values of x are summarised by the totals X(x —10)* = 1490 and E(x —10) = 100. 
Twelve values of y are summarised by the totals £( y + 5)? = 5139 and X(y + 5) = 234. 


Find the variance of the 20 values of x and y together. 


| Answer 
— 100 ; E : 
x= “qt ee cee sole He xk 225 er s We find the totals £x and £x?. 


2 


à 
Var(x) = Var(x — 10) = 190 C2) =30. 


2 
ony =30, so Ex? = 4290. 
-234 PAS a Tea 
y= g M , SO Èy =12 x14.5 = 174. ° > o| We find the totals Ly and Xy~. 
z 05139 (234) _ 
Var(y) = Var(y + 5) = ay (=) = 48. 


2 
TIA 14.5? = 48, so Ly” = 3099. 


Var(x and y) = 


Èx? + Ly? b=) 


| 8+ 12 8412 
_ 4290 + 3099 (180 +1747 
20 ANS) 
= 56.16 


WORKED EXAMPLE 3.12 


It is known that 20 girls each have at least one brother. The number of brothers that they have is denoted by x. 
Information about the values of x—1 is given in the following table. 


0 1 2 3 4 
G 2 4 8 5 1 


Use the coded values to calculate the standard deviation of the number of brothers, to 3 decimal places. 


| Answer 
1 2 3 We extend the freqvency table to find the 
4 8 5 1 Sf =20 necessary totals. 
4 16 15 4 | Y(x-lf =39 
4 | 32 45 16 | X(x-1)? f =97 
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SD(x) = SD(x- 1) 


= Pah (zs) eeeseeeeeeeeoee058 


DK F 


=1.023 


EXERCISE 3D 


1 Two years ago, the standard deviation of the masses of a group of men and a 
group of women were 8kg and 6kg, respectively. Today, all the men are 5kg 
heavier and all the women are 3kg lighter. Find the standard deviation for each 
group today. 


2 Twenty readings of y are summarised by the totals £(y— 5) =890 and 
X(y—5)=130. Find the standard deviation of y. 


3 The amounts of rainfall, rmm, at a certain location were recorded on 365 
consecutive days and are summarised by X(r— 3)’ = 9950 and X(r—3)=1795.8. 
Calculate the mean daily rainfall and the value of =r’. 


4 Exactly 20 years ago, the mean age of a group of boys was 15.7 years and the 
sum of the squares of their ages was 16000. If the sum of the squares of their 
ages has increased by 8224 in this 20-year period, find the number of boys in 
the group. 


5 Readings from a device, denoted by y, are such that X(y— 3)? = 2775, Ly =105 and 
the standard deviation of y is 13. Find the number of readings that were taken. 


(Mm) 6 Mei measured the heights of her classmates and, after correctly analysing her data, 
she found the mean and standard deviation to be 163.8cm and 7.6cm. Decide 
whether or not these measures are valid, given the fact that Mei measured all the 
heights from the end of the tape measure, which is exactiy 1.2 cm from the zero 
mark. 


Explain your answers. 
© 7 A transport company runs 21 coaches between two cities every week. In the 


past, the mean and variance of the journey times were 4 hours 35 minutes and 
53.29 minutes”. 


What would be the mean and standard deviation of the times if all the coaches 
departed 10 minutes later and arrived 5 minutes earlier than in the past? 


Are there any situations in which achieving this might actually be possible? 
(Ps) 8 During a sale, a boy bought six pairs of jeans, each with leg length «cm. He 
also bought four pairs of pants, each with leg length (x —2)cin. The boy is quite 


short, so his tather removed 4cm from the length of each trouser leg. Find the 
variance of the leg lengths after his father made the alterations. 
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Q 9 a Find the mean and standard deviation of the first seven positive even 


i (>>) FAST FORWARD 
integers. 


You will study the 


b Without using a calculator, write down the mean and standard deviation of 
variance of linear 


the first seven positive odd integers. B 
combinations of 


c Find an expression in terms of n for the variance of the first n positive even random variables in the 
integers. What other measure can be found using this expression? Probability & Statistics 
i : m 2 Coursebook, 
19 Each year Upchester United plays against Upchester City in a local derby Chapter 3. 


match. The number of goals scored in a match by United is denoted by u and the 
number of goals scored in a match by City is denoted by c. The number of goals 
scored in the past 15 matches are summarised by X(u—1)* = 25, X(u—1) = 9, 

£c? = 39 and Xe = 19. 


a How many goals have been scored altogether in these 15 matches? 
b Show that Lu? = 58. 
c Find, correct to 3 decimal places, the variance of the number of goals scored 
by the two teams together in these 15 matches. 
11 Twenty values of x are summarised by X(x — 1)? = 132 and X(x -1)= 44. 
Eighty values of y are summarised by L(y +1)? =17 704 and X(y +1) =1184. 
a Show that Ex = 64 and that Ex? = 240. 


b Calculate the value of £y and of £y?. 
c Find the exact variance of the 100 values of x and y combined. 
12 The heights, x cm, of 200 boys and the heights, y cm, of 300 girls are summarised 
by the following totals: 


x(x - 160)? = 18 240, X(x — 160) = 1820, Z(y— 150)” = 20 100, Z(y — 150) = 2250. 
a Find the mean height of these 500 children. 


b By first evaluating Ex? and Ly’, find the variance of the heights of the 500 
children, including appropriate units with your answer. 


What effect does mu!tiplication of all the values in a dataset have on its variation? And 
how can we find the variance and standard deviation of the original values from the 
coded data? We saw in Chapter 2, 
Section 2.2 that the 


Consider the total cost of hiring a taxi for which a customer pays a fixed charge of $3 plus 
$4 ver kilometre travelled. Using y for the total cost and x for the distance travelled in mean of a set of data 
kilometres, the cost can be calculated from the equation y=4x +3. Some example values can be found from a 


are shown in the following table. coded total such as 
X(ax—b). 
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When the distance changes or varies by +1, the cost changes or varies by +4. 
Variation in cost is affected by multiplication (x4) but not by addition (+3). 


If we consider the graph of y= 4x + 3, then it is only the gradient of the line that affects the 
variation of y. If we increase the x-coordinate of a point on the line by 1, its y-coordinate 
increases by 4. 


Multiplying x= {1,2,3} by 4 ‘stretches’ the whole set to {4, 8,12}, which affects the pattern 
of spread. 


Adding +3 to {4,8,12} simply translates the whole set to y= {7, 11,15}, which has no effect 
on the pattern of spread. 


Journeys of 1, 2 and 3km, costing $7, $11 and $15 are represented in the following 
diagram. 


Distance —@——@— 0—1 


Cost —— — —— eoo O0 _ p = 4x +3 


In the diagram, we see that the range of y is 4 times the range of x, so the range of x is 1 


times the range of y. For x= {1,2,3} and 4x+3={7,11,15}, you should check to confirm 
the following results: 


SD(x) = 1 x SD(4x +3) and Var(x)= Z x Var(4x +3). 


5 KEY POINT 3.6 Ro) 


2) eA?) ` = 2 
For ungrouped data: 2o r= 1 [2 Dee [Ee 2) l 
n 


GA n n 


O. 
Z7 


2 
For grouped data: a = baa - í =e our ) | 


These formulae can be summarised by writing Var(x) = — x Var(ax —b) or War(ax —b) = a? x Var(x). 


AMPLE 3.14 


The standard deviation of the prices of a selection of brand-name products is $24. 
Imitations of these products are all sold at 25% of the brand-name price. Find the 
variance of the prices of the imitations. 


Answer 
Var(0.25x) = 0.25 x Var(x) ` -e5 
=0.25° x24" 
=36 The units for variance 


in this case are ‘dollars 
squared’. 
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8 E = 


Given that (3x —1)* = 9136, X(3x—1)=53 and n=10, find the value of Xx”. 


WORKED EXAMPLE 3.15 
Answer 
= 53 7 A : . 
3x- 1= T0 We first find +, knowing that the mean of the coded values is 1 less 
053 than 3 times the mean of x; that is, mean (3x -1)= 3x- 1. 
Y==x (= 1) 
3 \10 
=2A 
\ 
| 1 
Var(x)= 37 x Var(3x-1) We form and solve an equation knowing that 
Ex? so 1 (9136 2) Var(x)= +, x Var(ax b) and Var(i3x- l= ae ace 
ars 53° a 10 
10 9 10 J 
Zy 
— — 4.41 = 98.39 
10 
Lr S108 
a 1 The range of prices of the newspapers sold at a kiosk is $0.80. After 6p.m. all prices are reduced by 20%. Find 


the range of the prices after 6p.m. 
2 Find the standard deviation of x, given that 24x? = 14600, 22x = 420 and n= 20. 


3 The values of x given in the table on the left have a standard deviation of 0.88. Find the standard deviation of 
the values of y. 


2 4 6 
a b c 


4 The temperatures, T° Celsius, at seven locations in the Central Kalahari Game Reserve were recorded at 
4p.m. one January afternoon. The values of T, correct to | decimal place, were: 
32.1, 31.7, 31.2, 31.5, 31.9, 32.2 and 32.7. 


a Evaluate 210(T — 30) and £100(T — 30). 


7 13 19 


a b c 


b Use your answers to part a to calculate the standard deviation of T. 


c By 5p.m. the temperature at each location had dropped by exactly 0.75°C. Find the variance of the 
temperatures at Sp.m. 


ou 


Building plots are offered for sale at $315 per square metre. The seller has to pay a lawyer’s fee of $500 from 
the money received. Salome’s plot is 240 square metres larger than Nadia’s piot. How much more did the seller 
receive from Salome than trom Nadia after paying the lawyer’s fees? 


6 Temperatures in degrees Celsius (°C) can be converted to temperatures in degrees Fahrenheit (°F) using the 
formula F = L8C + 32. 


a The temperatures yesterday had a range of 15°C. Express this range in degrees Fahrenheit. 
b Temperatures elsewhere were recorded at hourly intervais in degrees Fahrenheit and were found to have mean 


54.5 and variance 65.61. Find the mean and standard deviation of these temperatures in degrees Celsius. 


Copyright Material - Review Only - Not for Redistribution 


Chapter 3: Measures of variation 


(M) 7 Ten items were selected from each of four sections at a supermarket. Details of the prices of those items, in 
dollars, on Ist April and on Ist June are shown in the following table. 


For which section’s items could each of the following statements be true? Briefly explain each of your answers. 
a The total cost of the items did not change. 

b The price of each item changed by the same amount. 

c The proportional change in the price of each item was the same. 


(Ps) 8 The lengths of 45 ropes used at an outdoor recreational centre can be extended by 30% when stretched. 
The sum of the squares of their stretched lengths is 0.0507 km* and their natural lengths, x metres, are 
summarised by E(x -— 20)? = 1200. Find the mean natural !ength of these 45 ropes. 


(Ps] 9 Overa short period of time in 2016, the value of the pound sterling (£) fell by 15.25% against the euro (€). Find 
the percentage change in the value of the euro against the pound over this same period. 


Appendix to Section 3.3 
In this appendix, we show how the two formul!se for variance are equivalent. For 
simplicity, we will assume that n= 3. 


If we denote our three numbers by x, x) and x3, then x = Zoi +X + x3). 
X(x-xy 
n 


Variance is defined by Var(x) = , so if we expand the brackets and rearrange, we get 


Var(x) = 3[ 01-8 + 09-37 #05- 37] 


1 = ` = = = = 
= zl — 2Xx)-+ X?+ x? —2Xx_ + X°+ x? —2xXx; + x] 


le ee ee =2 
= + x5 + x3 —2X(X1 tx +X3)4+3% ] 
ee ee 
KP +X +x Art x = : : 2 
== 3 3 -23| l 7 3 |+ x? Note: the term in brackets is equal to X. 
A, EE. 
Xp +X, +x _ = 
a 1 : 3 2x2 + x2 


Ext o rae . Ix ya 
= a X^, which is the alternative formula —-— x^ in the case where n= 3. 
n 


Try showing that the two formulae for variance are equivalent for the simple case where 
n=2, and then challenge yourself by taking on n= 4 or larger. Can you generalise this 
argument to an arbitrary value of n? 
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KS 
© 
Checklist of learging and understanding = -.@ 


Commonly used measures of variation are the range, interquartile range and standard deviation. 


A box-and-whisker diagram shows the smallest and largest values, the iower and upper quartiles 
and the median of a set of data. 


For ungrouped data, the median Q, is at the (“nh value. 
For grouped data with total frequency n = Lf, the quartiles are at the following values. 


Lower quartile Q is at r or iy ; 


Middle quartile Q, is at 5 or iy : 


Upper quartile Q; is at or ayy. 


IQR =0;-0 


For ungrouped data: 


= a Daa 
x? , where x =—. 
n n n 


Zea = 


Standard deviation = / Variance = \ 


For grouped data: 


Ex 


x‘, where x= = 


Standard deviation = /Variance = 


Ae E 
Y W 


For datasets x and y with n, and n, values, respectively: 


Èx+È 3 
AAT AY and Variance = 
ye T i ny +Ny 


Daea ea, 
Mean = - z]. 


gett hy 


The formulae for ungrouped and grouped coded data can be summarised by: 


Var(x)= Var(x—b) and 

1 
z 
e For ungrouped coded data: 


ae" x2 Zaby = Ai 


n n n 


Var(x)=— x Var(ax—b) or Varax bD)E a x Var(x) 


i jet (24) 


n a n n 
e For grouped coded data: 


Bef a U(x-bPef (xb) f Y 
Fi x2 = s ( y ) ana 


ref _,_ 1 | Xax-b¥ f (Zax-b)f Y 
es F z 
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| END-OF-CHAPTER REVIEWGKERCISE 3 REVIEW GKERCISE 3 


1 Three boys and seven girls are asked how much money they have in their pockets. The boys have $2.50 each 
and the mean amount that the 10 children have is $3.90. 


a Show that the girls have a total of $31.50. [1] 


b Given that the seven girls have equal amounts of money, find the standard deviation of the 
amounts that the 10 children have. [3] 


G 


N 


Jean Luc was asked to record the times of 20 athletes in a long distance race. He started his stopwatch when 
the race began and then went to sit in the shade, where he fell asleep. On waking, he found that x athletes had 
already completed the race but he was able to record the times taken by all the others. 


State the possible value(s) of x if Jean Luc was able to use his data to calculate: 
a the variance of the times taken by the 20 athletes [1] 
b the interquartile range of the times taken by the 20 athletes. [2] 


Q 3 The quiz marks of nine students are written down in ascending order and it is found that the range and 
interquartile range are equal. Find the greatest possible number of distinct marks that were obtained by the 
nine students. [2] 


4 Two days before a skiing competition, the depths of snow, x metres, at 32 points on the course were measured 
and it was discovered that the numerical values of Xx and £x? were equal. 


a Given that the mean depth of snow was 0.885 m, find the standard deviation of x. [2] 


b Snow fell the day before the competition, increasing the depth over the whole course by 1.5cm. 
Explain what effect this had on the mean and on the standard deviation of x. [2] 


5 The following box plots summarise the percentage scores of a class of students in the three Mathematics tests 
they took this term. 


Percentage scores (%) 
30 40 50 60 70 80 90 100 


= 
oO 
wa 


= 
O 
w 
a a A A E Eaa AG 


te 


2) 


a Describe the progress made by the class in Mathematics tests this term. [2] 
b Which of the tests has produced the least skewed set of scores? [1] 
c What type of skew do the scores in each of the other two tests have? [2] 


6 The following table shows the mean and standard deviation of the lengths of 75 adult puff adders (Bitis 
arietans), which are found in Aftica and on the Arabian peninsula. 


a Find the mean length of the 75 puff adders. [3] 


b The lengths of individual African puff adders are denoted by x, and the lengths of individual Arabian 
puff adders by x+. By first finding Ex} and xj, calculate the standard deviation of the lengths of all 
75 puff adders. [5] 
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G 


10 


11 


12 


The scores obtained by 11 people throwing three darts each at a dartboard are 54, 46, 43, 52, 180, 50, 41, 56, 52, 49 
and 54. 


a Find the range, the interquartile range and the standard deviation of these scores. [4] 


b Which measure in part a best summarises the variation of the scores? Explain why you have chosen this 
particular measure. [2] 


The heights, xcm, of a group of 28 people were measured. The mean height was found to be 172.6cm and the 
standard deviation was found to be 4.58cm. A person whose height was 161.8cm left the group. 


i Find the mean height of the remaining group of 27 people. [2] 


ii Find £x? for the original group of 28 people. Hence find the standard deviation of the heights of the 
remaining group of 27 people. [4] 


Cambridge International AS & A Level Mathematics 9709 Paper 63 Q4 June 2014 


120 people were asked to read an article in a newspaper. The times taken, to the nearest second, by the people to 
read the article are summarised in the following table. 


A 1-25 26-35 36-45 46-55 56-90 
Fos 4 24 38 34 20 
Calculate estimates of the mean and standard deviation of the reading times. [5] 


Cambridge International AS & A Level Mathematics 9709 Paper 62 Q2 June 2015 


The weights, in kilograms, of the 15 basketball players in each of two squads, A and B, are shown below. 


97 | 98 | 104} 84 | 100} 169] 115} 99 | 122 | 82 |116| 96 | 84 | 107) 91 
75 | 79 | 94 |101 96 | 77 |111 |108| 83] 84 | 86 |115 82 | 113) 95 


Represent the data by drawing a back-to-back stem-and-leaf diagram with squad A on the left-hand side 
of the diagram and squad B on the right-hand side. [4] 


ii Find the interquartile range of the weights of the players in squad A. [2] 


iii A new player joins squad B. The mean weight of the 16 players in squad B is now 93.9kg. Find the weight 
of the new player. [3] 


Cambridge International AS & A Level Mathematics 9709 Paper 62 Q5 November 2015 [Adapted] 


The heights, xcm, of a group of 82 children are suminarised as follows. 


X(x- 130) =—287, standard deviation of x = 6.9. 

i Find the mean height. [2] 

ii Find X(x-130)?. [2] 
Cambridge International AS & A Level Mathematics 9709 Paper 63 Q2 June 2010 

A sample of 36 data values, x, gave X(x—45)=—148 and E(x- 45)? = 3089. 

i Find the mean and standard deviation of the 36 values. [3] 

ii One extra data vaiue of 29 was added to the sample. Find the standard deviation of all 37 values. [4] 


Cambridge International AS & A Level Mathematics 9709 Paper 62 Q3 June 2011 


Copyright Material - Review Only - Not for Redistribution 


(P Bs) 


13 


14 


15 


17 


18 


Chapter 3: Measures of variation 


The ages, x years, of 150 cars are summarised by Ex = 645 and Ex? = 8287.5. Find X(x—x)*, where x denotes 
the mean of x. [4] 


Cambridge International AS & A Level Mathematics 9709 Paper 62 Q1 June 2012 
A set of data values is 152, 164, 177,191, 207, 250 and 258. 
Compare the proportional change in the siandard deviation with the proportional change in the interquartile 
range when the value 250 in the data set is increased by 40%. [5] 


A shop has in its stock 80 rectangular celebrity posters. All of these posters have a width to height ratio 

of 1:/2, and their mean perimeter is 231.8cm. 

Given that the sum of the squares of the widths is 200120cm/7, find the standard deviation of the widths of the 
posters. [4] 


At a village fair, visitors were asked to guess how many sweets are in a glass jar. The best six guesses were 
180, 211, 230, 199, 214 and 166. 


S 
a Show that the mean of these guesses is 200, and use SD =, AO- to calculate the standard deviation. [4] 
n 


b The jar actually contained 202 sweets. Without further calculation, write down the mean and the 
standard deviation of the errors made by these six visitors. Explain why no further calculations are 
required to do this. [4] 


The number of women in senior management positions at a number of companies was investigated. The 
number of women at each of the 25 service companies and at each of the 16 industrial companies are denoted 
by ws and wy, respectively. The findings are summarised by the totals: 

X(ws —5)* = 28, L(ws — 5) = 15, E(w; — 3)? = 12 and E(w; — 3) = —4. 


a Show that there are, on average, more than twice as many women in senior management 
positions at the service companies than at the industrial companies. [3] 


b Show that Zwg #(Lws)’ and that Ew? # (Zw). [5] 


c Find the standard deviation of the number of women in senior management positions at all of these 
service and industrial companies together. [3] 


The ages, a years, of the five members of the boy-band AlphaAsise are such that X(a— 21)? = 11.46 and 
X(a—21)=-6. 


The ages, b years, of the seven members of the boy-band BetaBeat are such that £(b—18)* =10.12 and 
X(b -18)=0. 


a Show that the difference between the mean ages of the boys in the two bands is 1.8 years. [3] 


b Find the variance of the ages of the 12 members of these two bands. [7] 
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CROSS-TOPIC REVIEW EXE <H O 
à — 


1 Two players, A and B, both played seven matches to reach the fina! of a tennis tournament. The number of games 
that each of them won in these matches are given in the following back-to-back stem-and-leaf diagram. 


Player A | | PiayerB Key: 5|2|6 


1/89 represents 25 
220 ]2)1134 games for A 
875]2) 6 and 26 games 
313 for B 
a How many fewer games did player B win than player A? [1] 
b Find the median number of games won by each player. [2] 


c Ina single stem-and-leaf diagram, show the number of games won by these two players in all of 
the 14 matches they played to reach the final. [3] 


2 A total of 112 candidates took a multiple-choice test that had 40 questions. The numbers of correct 
answers given by the candidates are shown in the following table. 


nO 0-9 10-15 16-25 26-30 31-39 40 
N 18 24 27 23 19 l 


a State which class contains the lower quartile and which class contains the upper quartile. Hence, find 
the ieast possible value of the interquartile range. [3] 


b Copy and complete the following table, which shows the numbers of incorrect answers given by 
the candidates in the test. 


0 1-9 
1 | owl [3] 


c Calculate an estimate of the mean number of incorrectly answered questions. [3] 


3 Ata factory, 50-metre lengths of cotton thread are wound onto bobbins. Due to fraying, it is common for a 
length, /cm, of cotton to be removed after it has been wound onto a bobbin. The following table summarises 
the lengths of cotton thread removed from 200 bobbins. 


0<1<25 25=7 250 5<1<10 
137 49 14 


a Calculate an estimate of the mean length of cotton removed. [3] 


b Use your answer to part a to calculate, in metres, an estimate of the standard deviation of the length 
of cotton remaining on the 200 bobbins. [4] 


4 People applying to a Computing college are given an aptitude test. Those who are accepted take a progress test 
3 months after the course has begun. The following table gives the aptitude test scores, x, and the progress test 
scores, y, for a random sample of eight students, A to H. 


a Find the interquartiie range of these aptitude test scores. [1] 


b Use the summary totals Ex = 588, £x? = 44080, Ly = 544 and Xy’ = 38030 
to calculate the variance of the aptitude and progress test scores when they are considered together. [3] 
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c The mean progress score for all the students at the college is 70 and the variance is 112. Any student 
who scores less than 1.5 standard deviations below the inean is sent a letter advising that improvement 
is needed. Which of the students A to H should the letter be sent to? [1] 


The growth of 200 tomato plants, half of which were treated with a growth hormone, was monitored 
over a 5-day period and is summarised in the following graphs. 


Eri IFIED EIRIEIFIEL BEIRIEL ECE Pritt EEE O 
100 EEEE] rit CEO O - HH CE SIPISEIEIE) H HEH pik 
PEE E reated 77 HH 
fans REN (ERS EEE eee PAAR! j Ej iF iiei 06l Pe E E O ESS e e e E E D E oi aimn | EEE GE: jon 
GN mama HIE GEC erga heer AGEL arinin HG ma HIG mia 
ae HE i 
= 80 TT Ea Et H TENNEEN mami me 
F EEEEEEEEEE PELE EEE EEAO EEE EE EEE EEE EEE HELE EPEC EHEH 
o | mi HEE PSB RS SSE 20 SRE e PEE Ree eRe ee eee fede (tebe EE E E [ae] 
Z co m ia Coo COC errr EEE eee iate] 
g 60 PEIEIEL C piate, Prai SLRS SIRI iae e Eeng ji al O 
E o iit praep E tt eoep aeaee Ee ege oS asea aaae NENEN 
7 EEE EEE e GGG Soe ee ee EERE EEE 
2 EEE EEEEEHH EEE EEE EE EEE EEE Lt EEE EEE EEE 
Ga Cot | Cot EEEE j [alata fetal Coo | 
2 ao EE ii | EEEE 
3 H J H 
E H 
; He i Ei 
Coo Cho | m| 
20 P ECCS ECC ECC EPEC PPE 
EEA 
H | Fi i HEHH | HHHH | HH | = HEH { HHHH ERETI HHH FEFE HEH FEEFEE HH 
| | I | HEEE Lita Ft BEEN] AESA RARA NENAHASNAN EASRA 
0 1 2 3 4 5 6 7 8 9 10 
Growth (cm) 
Use the graphs to describe two advantages of treating these tomato plants with the growth hormone. [2] 


A survey of a random sample of 23 people recorded the number of unwanted emails they received in a particular 
week. The results are given below. 


9 18 13 18 2017 42027" R 11) 26 26 32 17 3 20) om 13° 25°35 29) 14 
a Represent the data in a stem-and-leaf diagram. [3] 
b Draw, on graph paper, a box-and-whisker diagram to represent the data. [4] 


The volumes of water, x x 10° litres, needed to fill six Olympic-sized pools are 
2.82, 2.50, 2.75, 3.14, 3.66 and 3.07. 


a Find the value of X(x—2) and of X(x-2)’. [2] 


b Use your answers to part a to find the mean and the standard deviation of the volumes of water, giving both 
answers correct to the nearest litre. [5] 


The speeds of 72 coaches at a certain point on their journeys between two cities were recorded. The results are 
given in the following table. 


<50 <54 <70 <75 <85 
0 9 41 54 72 


a State the number of coaches whose speeds were between 54 and 70 km/h. [1] 

b A student has illustrated the data in a cumulative frequency polygon. Find the two speeds between which the 
polygon has the greatest gradient. [1] 

c Calculate an estimate of the lower boundary of the speeds of the fastest 25 coaches. [3] 
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10 


11 


The following tabie shows the masses, m grams, of 100 unsealed bags of plain potato crisps. 


ON 34.6 < m < 35.4 35.4 < m < 36.2 36.2 < m < 37.2 
20 30 50 


a Show that the heights of the columns in a histogram illustrating these data must be in the ratio 2:3: 4. [2] 


b Calculate estimates of the mean and of the standard deviation of the masses. [4] 


c Before each bag is sealed, 0.05 grams of salt is added. Find the variance of the masses of the sealed bags of 
salted potato crisps. [1] 


The masses, in carats, of a sample of 200 pearls are summarised in the following cumulative frequency graph. 
One carat is equivalent to 200 milligrams. 


Cumulative frequency (No. pearls) 


Mass (carats) 


a Use the graph to estimate, in carats: 
i the median mass of the pearls [1] 
ii the interquartile range of the masses. [2] 


b To qualify as a ‘paragon’, a pearl must be flawless and weigh at least 20 grams. Use the graph to estimate the 
largest possible number of paragons in the sample. [2] 


The amounts spent, S dollars, by six customers at a hairdressing salon yesterday were as follows. 

12.50, 15.75, 41.30, 34.20, 10.80, 40.85. 

Each of the customers paid with a $50 note and each received the correct change, which is denoted by $C. 

a Find, in dollars, the value of S +C andof S-C. [3] 


b Explain why the standard deviation of S and the standard deviation of C are identical. [3] 
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Cross-topic review exercise 1 


The numbers of goals, x, scored by a team in each of its previous 25 games are summarised by the totals 

Sx D =30and X(x-1)=12. 

a Find the mean number of goals that the team scored per game. [2] 
b Find the value of £x?. [3] 


c Find the value of a and of b in the following table, which shows the frequencies of the numbers 
of goals scored by the team. 


0 1 2 3 >3 
5 a b 4 0 


[2] 
The lengths of some insects of the same type from two countries, X and Y, were measured. 
The stem-and-leaf diagram shows the results. 
Country X Country Y 

(10) 9766644432] 80 
(18)  888776655544333220/81/1122333556789 (13) 
(16) 909887 7655322200182100 12333974566788 (15) 
(16) 876555 3322211100] 83|0122444455 6677789 (17) 
i1 87655443311|84|001244556677789 (15) 
a1) 85|12r335566788 (12) 

86101223555899 (11) 

Key: 5|81|3 means an insect from country X has length 0.815 cm 
and an insect from country Y has length 9.813 cm. 

i Find the median and interquartile range of the lengths of the insects from country X. [2] 


ii The interquartile range of the lengths of the insects from country Y is 0.028cm. Find the values of q and r. [2] 
iii Represent the data by means ot a pair of box-and-whisker plots in a single diagram on graph paper. [4] 


iv Compare the lengths of the insects from the two countries. [2] 
Cambridge International AS & A Level Mathematics 9709 Paper 63 Q6 June 2010 
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In this chapter you will learn bow to: 


„2“ Probability 


m evaluate probabilities by means of enumeration of equiprobable (i.e. equally likely) 
elementary events 

m use addition and multiplication of probabilities appropriately 

m use the terms mutually exclusive and independent events 

m determine whether two events are independent 

m calcuiate and use conditional probabilities. 


Chapter 4: Probability 


PREREQUISITE KN OWLERGE 
+ wv 
yO 


Where it comes from What you should be able to do Check your skills 


IGCSE / © Level Calculate the probability of a single 1 How many 6s are expected when an 
Mathematics event as either a fraction, decimal or ordinary fair die is rolled 180 times? 
percentage. 


Understand and use the probability 
scale from 0 to 1. 


2 Find the probability of obtaining 
a total of 4 when the scores on 
two ordinary fair dice are added 

Calculate the probability of simple together. 

combined events, using possibility 

diagrams and tree diagrams where 


appropriate. It is given that A={1, 2, 5}, 

Use language, notation and Venn B’={2, 4, 5} and A’={3, 4}. Using 
diagrams to describe sets and represent a Venn diagram, or otherwise, find 
relationships between sets. n(A U B’) and n( 4A B). 


Understand relative frequency as an 
estimate of probability. 


If we do this, how likely is that? 

Probability measures the likelihood of an event occurring on a scale from 0 (i.e. impossible) to 
i (i.e. certain). We write this as P(name of event), and its value can be expressed as a fraction, 
decimal or percentage. The greater the probability, the more likely the event is to occur. 


Although we do not often calculate probabilities in our daily lives, we frequently assess and 
compare them, and this affects our behaviour. Do we have a better chance of performing 
well in an exam after a good night’s sleep or after revising late into the night? Shou!d you 
visit a doctor or is your sore throat likely to heal by itself soon? 


Insurance is based on risk, which in turn is based on the probability of certain events 
occurring. Government spending is largely determined by the probable benefits it will 
bring to society. 


4.1 Experiments, events and outcomes 
The result of an experiment is called an outcome or elementary event, and a combination 
of these is known simply as an event. 


Rolling an ordinary fair die is an experiment that has six possible outcomes: 1, 2, 3, 4, 5 or 6. 
Obtaining an odd number with the die is an event that has three favourable outcomes: 1, 3 or 5. 
Random selection and equiprobabie events 

The purpose of selecting objects at random is to ensure that each has the same chance of 


being selected. This method of selection is called fair or unbiased, and the selection of any 
particular object is said to be eaua!ly likely or equiprobable. 


A KEY POINT 4.1 


AIR 5 : : 1 
When one object is randomly selected from n objects, P(selecting any particular object) = a 
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The probability that an event occurs is equal to the proportion of equally likely outcomes 
that are favourable to the event. 


Number of favourable equally likely outcomes 


P t)= 
(eni Total number of equally likely outcomes 


Consider randomly selecting 1 student from a group of 19, where 11 are boys and eight are 
giris. 


There are 19 possible outcomes: 11 are favourable to the event selecting a boy and eight are 
favourable to the event selecting a girl, as shown in the following table. 


ww, 
> 
: : | 1 
Selecting any particular boy 19 The word particular 
i specifies one object. 
Selecting any particular girl D These three outcomes are equally It does not matter 
likely. whether that object is a 
Selecting any particular student as boy, a girl or a student; 
19 i each has a b chance 
Selecting à boy a5 1i of the 19 equally likely outcomes of being selected. 
19 are favourable to this event. 
galeta a airl 8 8 of the 19 equally likely outcomes 
> Cur = 
gag 19 are favourable to this event. 


Exhaustive events 

A set of events that contains all the possible outcomes of an experiment is said to be 

exhaustive. In the special case of event A and its complement, not A, the sum of their 9) 
probabilities is 1 because one of them is certain to occur. Recall that the notation used for 
the complement of set A is A’. 


P(A) + P(not A)=1 
or 
P(A) +P(4’)=1 


Examples of complementary exhaustive events are shown in the following table. 


Toss a fair coin heads tails EPE =1 
2 2 
oA). 1 5 
Roli a fair die less than 2 2 or more rae =] 
| Play a game of chess win | not win P(win) + P(not win) =1 


Trials and expectation 
Each repeat of an experiment is called a trial. The proportion of trials in which an event O) KEY POINT 4.4 
occurs is its relative frequency, and we can use this as an estimate of the probability that the 
event occurs. 


In 7 trials, event A 
is expected to occur 
n X P(A) times. 


If we know the probability of an event occurring, we can estimate the number of times it is 
likely to occur in a series of trials. This is a statement of our expectation. 


Copyright Material - Review Only - Not for Redistribution 


WORKED EXAMPLE 4. 


— ee 


The probability of rain on any particular day in a mountain village is 0.2. 


On how many days is rain not expected in a year of 365 days? 


Answer 
n= 365 and P(doesnot rain) = 1— 0.2 = 0.8 


365 x 0.8 = 292 days peeeeeeeeeeeeeeheseenesd 


EXPLORE 4.1 


We can see how closely expectation matches with what happens in practice by 
conducting simple experiments using a fair coin (or an ordinary fair die). 


Toss the coin 10 times and note as a decimal the proportion of heads obtained. 
Repeat this and note the proportion of heads obtained in 20 trials. Continue doing 
this so that you have a series of decimals for the proportion of heads obtained in 
10, 20, 30, 40, 50, ... trials. 


Represent these proportions on a graph by plotting them against the total number of 
trials conducted. How do your results compare with the expected proportion of heads? 


For trials with a die, draw a graph to represent the proportions of odd numbers 
obtained. 


EXERCISE 4A 


1 A teacher randomly selects one student from a group of 12 boys aad 24 girls. 
Find the probability that the teacher selects: 


a aparticular boy b a girl. 

2 United’s manager estimates that the team has a 65% chance of winning any 
particular game and an 85% chance of not drawing any particular game. 
a What are the manager’s estimates most likely to be based on? 


b Ifthe team plays 40 games this season, find the manager’s expectation of the 
number of games the team will lose. 


c Ifthe team loses one game more than the manager expects this season, 
explain why this does not necessarily mean that they performed below 
expectation. 


3 Katya randomly picks one of the 10 cards shown. 


Tepe TTT Tt fT ic 


If she repeats this 40 times, how many times is Katya expected to pick a card that 
is not blue and does not have a letter B on it? 
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4 A numbered wheel is divided into eight sectors of equal size, as shown. The 


wheel is spun unti! it stops with the arrow pointing at one of the numbers. ww 
Axel decides to spin the wheel 400 times. AS Pe 
a Find the number of times the arrow is not expected to point at a 4. 
b How many more times must Axel spin the wheel so that the expected 
number of times that the arrow points at a 4 is at least 160? 
5 A bag contains black and white counters, and the probability of selecting a black 
counter is L 


a What is the smallest possible number of white counters in the bag? 


b Without replacement, three counters are taken from the bag and 
they are all black. What is the smallest possible number of white 
counters in the bag? 


6 Whena coin is randomly selected from a savings box, each coin has a 98% 
chance of not being selected. How many coins are in the savings box? 


@® 7 Aset of data values is 8, 13,17, 18, 24, 32, 34 and 38. Find the probability fiae) REWIND 
that a randomly selected value is more than one standard deviation from : : 
h . We studied the mean in 
the pean: Chapter 2, Section 2.2 

; and standard 

(Ps) 8 One student is randomly selected from a school that has 837 boys. deviation in Chapter 3 
The probability that a girl is selected is 4. Find the probability that a Section 3 4. 
particular boy is selected. 


4.2 Mutually exclusive events and the addition law 


To find the probability that event A or event B occurs, we can simply add the probabilities 
of the two events together, but only 1f 4 and B are mutually exclusive. 


Mutually exclusive events have no common favourable outcomes, which means that it is 
not possible for both events to occur, so P(A and B)=0. 


For example, when we roll an ordinary die, the events ‘even number = {?, 4, 6} and ‘factor 
of 5={1,5} are mutually exclusive because they have no common favourable outcomes. 

It is not possible to roil a number that is even and a factor of 5. We say that the intersection 
of these two sets is empty. Therefore: 


P(evenor factor of 5) = P(even) + P(factor of 5) 


Events are not mutually exclusive if they have at least one common favourable outcome, 
which means that it is possible for both events to occur, so P(A and B) #0. 


For example, when we roll an ordinary die, the events ‘odd number = {1, 3, 5Y and 
‘factor of 5= {1,5}? are not mutually exclusive because they do have common favourable 
outcomes. It is possible to roll a number that is odd and a factor of 5. We say that the 
intersection of these two sets is not empty. Therefore: 


P(odd or factor of 5) # P(odd) + P(factor of 5) 
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The addition law for mutuaily exclusive events is P(A or B) = P(A) + P(B). 


This can be extended tor any number of mutually exclusive events: 
P(A or B orC ox ...) = P(A) + P(B)+ P(C) +... 


Venn diagrams 

Venn diagrams are useful tools for solving problems in probability. We can use them to 
show favourable outcomes or the number of favourable outcomes or the probabilities of The universal set € 
particular events. represents the complete 


set of outcomes and 
is called the possibility 
The set of outcomes that are not favourable to event A is the complement of A, denoted by Æ’. space. 


The number of outcomes favourable to event A is denoted by n(A). 


The following Venn diagrams illustrate various sets and their complements. 


g 
x! B 
‘A or B’ means event A 
occurs or event B 
occurs or both occur. 
P AU B means ‘A or B’ 
not A neithe 4 nor B not both A and B ‘ʻA and B’. 


e3 


Using set notation, the addition law for two mutually exclusive events is P(4U B) = P(A) + P(5). 


A and B are mutually exclusive when (47 B)=0; that is, when AN B=@ (Ø means the 
empty set). 


For non-mutually exclusive events, P(4U B) can be found by enumerating (counting) the 
favourable equally likely outcomes, taking care not to count any of them twice. We show 
how this can be done in part b of the following example. 


WORKED EXAMPLE 4.2 


| One digit is randomly selected from 1, 2, 3, 4, 5, 6, 7,8 and 9. Three possible events are: 
A: a multiple of 3 is selected. 
B: a factor of 8 is selected. 
C: a prime number is selected. 
a Show that the only pair of mutually exclusive events from A, B and C is A and B, and find P(AU B). 


b Find: 


i P(AUC) ii P(BUC). 
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B={1,2, 4,8}, so P(B)= =. 


a AAN B= Ø, so Aand B are mutually exclusive. eovcceccccccce: 


Answer 
€={1, 2,3, 4,5, 6,7, 8.0 PSSSSCHOSEBCCCFECECOHEEHECHXOES OEE 
A= {3, 6,9}, so P(A)= = 


C=(2,3,5,7},s0 NC S, 


ANC #@, so A and C are not mutually exclusive. 


BAC +Ø, so Band C are not mutually exclusive. 


P(4AU B)= P(Aor B) eeeeeeeeeeevceneeeeeeseeeeeene0 
= P(A) + P(B) 

3 4 

+— 

9 


Ool|N ol 


Both parts of this question can be answered using the lists of elements or the previous Venn diagrams. 


i n(A o C)=n(A)4+n(C)- n(4 ^ C)=34+4-1=6 pecccececcecs mE 
3 4 1 6 2 
P(A = P(A)+ P(C)- P(A a C)= = 
LAE =e a a ae es 
ii n(B U C)=n(B)+ n(C)—-n(B Aa C)=44+4-1=7 eeeeeeeeoee 
44-1 7 
P(B = P(B) + P(C) - P(B sI > Whee 
ee CSE Pe Clea gga 


In the first part of b above, we subtracted n(A A C) because the 
common elements in (A 1 C) have been counted in n(A) and in n(C). 


We follow the same steps when working directly with probabilities. 


This also applies to events that are mutually exclusive, where the 
number of common elements is equal to zero. 


So in general, for any two events A and B: 
n(AvU B)=n(A)+n(B)—n(An B) and 
P(AU B)= P(A) + P(B)- P(A B). 
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WORKED EXAMPLE 4.3 


In a survey, 50% of the participants own a desktop (D), 60% own a iaptop (L) and 
15% own both. 


What percentage of the participants owns neither a desktop nor a laptop? 


Answer 


1 


0.5 


p=09.5 — 0.15 =0.35 
q=0.6—0.15=0.45 
x =1- (0.35 + 0.15 + 0.45) = 0.05 or 5% 


z. 5% of the participants own neither a desktop 
nor a laptop. 


The symbol .*. means 
‘therefore’. 


MORKED EXAMPLE 4.4 
© 
Forty children were each asked which 
fruits they like from apples (4), 40 
bananas (B) and cherries (C). 


24 


The following Venn diagram shows the j 


number of children that like each type o 


of fruit. 


Find the probability that a randomly 
selected child likes apples or bananas. 


Answer 
P(AUB) = P(A) + P(B)-P(AA B) « « 
_17., 8 4 
~ 40° 40 40 
| adl 
40 


Alternatively, we can add up the 
numbers in the A and B circles: 
6+7+3+1+2+2=21, 


Ø ) KEY POINT 4.7 


_ For any two events, 4 and B, P(Aor B) = P(AUB) = P(A) + P(B)— P(ANB). 
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EXERCISE 4B 


1 


Find the probability that the number rolled with an ordinary fair die is: 
a aprime number ora 4 b asquare number or a multiple of 3 c more than 3 ora factor of 8. 
A group of 40 students took a test in Economics. The following Venn 


diagram shows that 19 boys (B) took the test and ihat seven students 40 B F 
failed the test (F). 


a Describe the 21 students who are members of the set B’. 


b Find the probability that a randomly selected student is a boy or 
someone who failed the test. 17 


The following table gives information about all the animals on a farm. 


a Find the probability that a randomly selected anima! is: 
i male ora goat ii a sheep or female. 


b Find a different way of describing each of the two types of animal in part a. 


Two ordinary fair dice are rolled and three events are: 
X: the sum of the two numbers rolled is 6. 
Y: the difference between the two numbers rolled is zero. 
Z: both of the numbers rolled are even. 
a List the outcomes that are favourable to: 
i XandY ii X and Z iii Y and Z. 
b What do your answers to part a tell you about the events X, Y and Z? 
The letters A, F, B, B, C, D, D and E are written onto eight cards and placed in a bag. Find the probability that 
the letter on a randomly selected card is: 
a a vowel or in the word DOMAIN 
b aconsonant or in the word DOUBLE. 


In a group of 25 boys, nine are members of the chess club (C), eight 


are members of the debating club (D) and 10 are members of 25 
neither of these clubs. This information is shown in the Venn 
diagram. 9 8 


a Find the values of a,b andc 
b Find the probability that a randomly selected boy is: 10 
i amember of ihe chess club or the debating club 


ii amember of exactly one of these clubs. 
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7 Forty girls were asked to name the capital of Cuba and of Hungary; 19 knew the capital of Cuba, 20 knew the 
capital of Hungary and seven knew both. 


a Draw a Venn diagram showing the number of girls who knew each of these capitals. 
b Find the probability that a randomly selected girl knew: 
i the capital of Cuba but not of Hungary ii just one of these capitals. 


€ Ina survey on pet ownership, 36% of the participants own a cat, 20% own a hamster but not a cat, and 8% 
own a hamster and a cat. What percentage of the participants owns neither a hamster nor a cat? 


9 A garage repaired 132 vehicles last month. The number of vehicles that 
required electrical (£), mechanical (M ) and bodywork (B) repairs are given in the 
diagram opposite. 


Find the probability that a randomly selected vehicle required: 
a mechanical or bodywork repairs 


b no bodywork repairs 


c exactly two types of repair. 


10 The 109 students at a technical college must study at least one subject from 
Pure Mathematics (P), Statistics (S) and Mechanics (M). The numbers 
studying these subjects are given in the diagram opposite. 


P S 


S 


a Who does the number 17 in the diagram refer to? 


b Find the probability that a randomly selected student studies: 


i Pure Mathematics or Mechanics 
ii exactly two of these subjects. 


c List the three subjects in ascending order of popularity. 


11 Events X and Y are such that P(Y)=0.5, P(Y)=0.6 and P(Y AY) =0.2. 
a State, giving a reason, whether events Y and Y are mutually exclusive. 
b Using a Venn diagram, or otherwise, find P(X UY). 
c Find the probability that X or Y, but not both, occurs. 


12 A, BandCareevents where P(A) =0.3, P(B)= 0.4, P(C)=0.3, P(AM B)=0.12, PANC)=0 and P(BAC)=0.1. 
a State which pair of events from A, B and C is mutually exclusive. 


b Using a Venn diagram, or otherwise, find P[(AU BUCY], 
which is the probability that neither A nor B nor C occurs. 


13 The diagram opposite shows a 30cm square board with two rectangular 
cards attached. The 15cm by 20cm card covers one-quarter of the 8cm by 
12cm card. 


; ee eae 30 
A dart is randomly thrown at the board, so that it sticks within its 


perimeter. Use areas to calculate the probability that the dart pierces: 


a both cards 


b atleast one of the cards 


c exactly one of the cards. 30 
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14 Given that P(4)=0.4, P(B)=0.7 and that P(AUB) = 0.8, find: 
a P(AUB’) b P(A’AB) 


(Ps] 15 Each of 27 tourists was asked which of the countries Angola (4), Burundi (B) and Cameroon (C) they had 
visited. Of the group, 15 had visited Angola; 8 had visited Burundi; 12 had visited Cameroon; 2 had visited all 
three countries; and 21 had visited only one. Of those who had visited Angola, 4 had visited only one other 
country. Of those who had not visited Angola, 5 had visited Burundi only. All of the tourists had visited at 


least one of these countries. 
a Draw a fully labelled Venn diagram to illustrate this information. 
b Find the number of tourists in set 4’ and describe them. 


c Describe the tourists in set (4-8) C’ and state how many there are. 


d Find the probability that a randomly selected tourist from this group had visited at least two of these three 


countries. 


4.3 Independent events and the multiplication law 

Two events are said to be independent if either can occur without being affected by the 
occurrence of the other. Examples of this are making selections with replacement and 
performing separate actions, such as rolling two dice. 


KEY PQINT 4.8 


The muittiplication law for independent events is P(A and £)= P(A A B)= P(A) x P(B). 


‘This can be extended for any number of independeni events: 
‘ 


| P(A and B and C and ...)\=P(A MN BOC oa...) = PA) x P(B) X P(C) x... 


Consider the following bag, which contains two blue balls (B) 
and five white balls (W). 


We will select one ball at random, replace it and then select 
another ball. 


For the first selection: P(B) = Z and P(W)= = 


7 
For the second selection: P(B)= Z and P(W)= 2. 


The tree diagram below shows how we can use the 
multiplication law to find probabilities. 


ist 2nd Events and probabilities 
z -2,2 _4 
7 Bowe P(BB) = ae we 
2 B 
7 = 2x ANDI0 
5. Wo asain P(BW) 7 a 49 
7 
2 Say 2 10 
7 = ` XxX2 = — 
5 7 po ase PWB) = 7X 49 
T < 
=5x3 = 25 
So" PWW= 7*7 74 
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The first and second 
selections are made 
from the same seven 
balls, so probabilities 
are identical and 
independent. 


We can denote the 
event ‘2 blue balls are 
selected’ by BB; B and 
B; B & B or B, B. 


Events BB, BW, WB 


and WW are 
exhaustive, so their 
probabilities sum to 1. 


Chapter 4: Probability 


Multiplication of independent events is performed from left to right along the branches. 
Addition of mutually exciusive events is performed vertically. 
As examples: 


P(different colours) = P(BW or WB) = P(BW) + pure) =( x =} +(2 x = |= 1% + MoA 


7°7) \7° 7) 49 49 49 
2 
P(same colours) = P( BB or WW) = P( BB) + PWW) = (3 x z) + (3 x =) = + 2 = 3 


As an alternative to using a tree diagram, we can use a possibility diagram (or outcome 
space), as shown below. The diagram shows the 7 x 7=49 equally likely outcomes and the 
four mutually exclusive combined events BE, BW, WB and WW. 


If we just used a 2 by 
2 diagram, with B and 
W as the outcomes 

of each selection, 

we could not just 
count cells to find 
probabilities, because 
the events in those 
four cells would not be 
equally likely. 


2nd seiection 


Ist selection 


To find probabilities for combined events, we count how many of the 49 outcomes are 


favourable. Probabilities are 
. = _ 4 10 10 24 equal to the relative 
For example: P(at least 1 blue) = P(BB)-+ P( BW) + P(WB) = 49 + 49 + 49 ~ 39° ea 
25 24 favourable outcomes. 


Al ively, P(at1 1 blue) =i- P =|]-—=—, 
ternatively, P(at least 1 blue) (WW) 49 29 


WORKED EXAMPLE 4 


Find the probability that the sum of the scores on three rolls of an ordinary fair die is less than 5. 


Answer 
| ee L eeceeeeeeeeeeesenenes 
216 
Pansija 
~ 6 
1 3 1 
P =— — = — 
TAE 
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WORKED EXAMPLE 4.6 


Abha passes through three independent sets of traffic lights when she drives to work. The probability that she has 
to stop at any particuiar set of lights is 0.2. Find the probability that Abha: 


a first has to stop at the second set of lights 
b has to stop at exactly one set of lights 


c has to stop at any set of lights. 


Answer 
| a P(XS)=0.8x0.2=0.16 eee eee We use S and X to represent ‘stopping’ 
and ‘not stopping’. For each set of lights, 
P(S) = 0.2 and P(X) = 9.8. 
b P(has to stop at exactly | set of lights) - The three favourable outcomes, SXX, XSX 
= P(SXX) + P(XSX) + P(XXS) and XXS, are equally likely. 
=3x (0.2 x 0.8 x 0.8) 
= 0.384 
c Phas to stop) = 1— P(does not have to stop) eeecce Mi he events ‘has to stop’ and ‘does not have to 
= 1—P(XXX) stop’ are complementary. 
= 14 0.8° 
= 0.488 


10o Alternatively, P(has to stop) = P(S) + P(XS) + P(XXS). 


EXERCISE 4C 


1 Using a tree diagram, find the probability that exactly one head is obtained when two fair coins are tossed. 


2 Two ordinary fair dice are rolled. Using a possibility diagram, find the probability of obtaining: 
a two 6s b two even numbers c two numbers whose product is 6. 

3 It is known that 8% of all new FunX cars develop a mechanical fault within a year and that 15% independently 
develop an electrical fault within a year. Find the probability that within a year a new FunX car develops: 
a both types of fault b neither type of fault. 

4 Acertain horse has a 70% chance of winning any particular race. Find the probability that it wins exactly one 
of its next two races. 

5 The probabilities that a team wins, draws or loses any particular game are 0.6, 0.1 and 0.3, respectively. 
a Find the probability that the team wins at least one of its next two games 
b If2 points are awarded for a win, | point for a draw and 0 points for a loss, find the probability that the 

team scores a total of more than 1 point in its next two games. 
6 On any particular day, there is a 30% chance of snow in Slushly. Find the probability that it snows there on: 


a none of the next 3 days b exactly one of the next 3 days. 
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(M) 7 Fatima will enter three sporting events at the weekend. 
Her chances of winning each of them are shown in the following table. 


ON Shot put | Javelin | Discus 
voty 85% 40% | 64% 


a Assuming that the three events are independent, find the probability that Fatima wins: 
i the shot put and discus ii the shot put and discus only iii exactly two of these events. 


b What does ‘the three events are independent’ mean here? Give a reason why this may not be true in real life. 


8 A fair six-sided spinner, P, has edges marked 0, 1, 2, 2, 3 and 4. 
A fair four-sided spinner, Q, has edges marked 0, —1, —1 and —2. 


Each spinner is spun once and the numbers on which they come to rest are added together to give 
the score, S. Find: 


a P(S=2) b P(S?=1) 


i) 9 Letters and packages can take up to 2 days to be delivered by Speedipost couriers. The following table shows 
the percentage of items delivered at certain times after sending. 


a Is there any truth in the statement ‘If you post 10 letters on Monday then only nine of them will be 
delivered before Wednesday’? Give a reason for your answer. 


b Find the probability that when three letters are posted on Monday, none of them are delivered on Tuesday. 


c Find the probability that when a letter and a package are posted together, the letter arrives at least | day 
before the package. 


10 The following histogram represents the results of 
a national survey on bus departure delay times. 


Two buses are selected at random. Calculate an 
estimate of the probability that: 


a both departures were delayed by 
less than 4minutes 


Frequency density 
(% of buses per minute) 


b atleast one of the buses departed more than 0 
7minutes late. 


1 2 3 4 5 6 7 8 9 10 


Delay time (min) 


11 Praveen wants to speak on the telephone to his friend. When his friend’s phone rings, he answers it with 
constant probability 0.6. If Praveen’s friend doesn’t answer his phone, Praveen will call later, but he will 
only try four times altogether. Find the probability that Praveen speaks with his friend: 


a after making fewer than three calls b on the telephone on this occasion. 
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12 Each morning, Ruma randomly selects and buys one of the four newspapers available at her local shop. Find 
the probability that she buys: 


a the same newspaper on two consecutive mornings 
b three different newspapers on three consecutive mornings. 
125 


13 A coin is biased such that the probability that three tosses all result in heads is ——. Find the probability of 
obtaining no heads with three tosses of the coin 512 


14 In a group of five men and four women, there are three pairs of male and female business partners and three 
teachers, where no teacher is in a business partnership. One man and one woman are selected at random. 
Find the probability that they are: 


a both teachers 
b ina business partnership with each other 


c each in a business partnership but not with each other. 


(Ps] 15 A biased die in the shape of a pyramid has five faces marked 1, 2, 3, 4 and 5. The possible scores are 
1, 2, 3, 4 and 5 and P(x) = EA where k is a constant. 


a Find, in terms ofk, the probability of scoring: 


c 


i 5 ii less than 3. 
b The die is rolled three times and the scores are added together. Evaluate k and find the probability that the 
sum of the three scores is less than 5. 


(KS) 16 A game board is shown in the diagram. 


Players take turns to roll an ordinary fair die, then 
move their counters forward from ‘start’ a number 
of squares equal to the number rolled with the die. 
If a player’s counter ends its move on a coloured 
square, then it is moved back to the start. 


a Find the probability that a player’s counter is on ‘start’ after roiling the die: 
i once ii twice. 
b Find the probability that after rolling the die three times, a player’s counter is on: 


i 18 ii 17 


| It has long been common practice to write Q.E.D. at the point where a mathematical proof 
or philosophical argument is complete. Q.E.D. is an initialism of the Latin phrase quad erat 
demonstrandum, meaning ‘which is what had to be shown’. 


Latin was used as the language of international communication, scholarship and science unti! well 
into the 18th century. 


Q.E.D. does not stand for Quite Easily Done! 


A popular modern alternative is to write W>, an abbreviation of Which Was Whai Was Wanted. 
pop 
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Application of the multiplication law 

The multiplication law given in Key point 4.8 can be used to show whether or not 

events are independent. If we can show that P(A B)= P(A) x P(B), then A and B are 
independent, and vice versa. If, for example, P(X )=0.3 and P(Y) = 0.4, then X and Y are 
independent only if PLY NY)=0.3x 0.4= 0.12. 


WORKED EXAMPLE 4.7 


Events J, K and L are independent. Given that P(J)= 0.5, P(K)=0.6 and P(J A L)=0.24, find: 


a PAK) b P(L) c P(KAL). 
Answer 
a P(J 0 K)=0.5 x 0.6 eeeeeeeeeeeoeeeeoeoeeeeeoe OX 
= 0.3 
T 
b 0.5 x P(L) = 0.24 peeeeeeeeoeeeoeeeeeeeeeeece € NO 
94 
PIL 
0.5 
= 0.48 
= 0.288 


Examples of independence and non-independence that you may have come across are: 


e Enjoyment of sport is independent of gender if equal proportions of males and females 
enjoy sport. 

e If unequal proportions of employed and unemployed people own cars, then car 
ownership is not independent of employment status — it is dependent on it. 


A] 


WORKED EXAMPLE 48 


In a group of 60 students, 27 are male 
(M) and 20 study History (H). The 60 
Venn diagram shows the numbers of 
students in these and other categories. 27 
One student is selected at random 
from the group. Show that the events 
‘a male is selected’ and ‘a student 
who studies History is selected’ are 
independent. 


Answer 
Does P(M) x P(A)=P(MN4A)? eooce 
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P(M)= 2L PH j= ras and `» s » < We state the three probabilities 
60 60 When we show that two 
P _ 9 CONS ome events are independent 
(MO H)= 60 (or not), it is important 
27 20 to state a conclusion 
P(M)xP(H) = = x — Then we evaluate P(M) x P(H) to in words after doing 
oe show whether oi not it is equal to the mathematics. 
= a P(M ^H). A short sentence 
60 like the last line of 
=P(M ^H) Worked example 4.8 is 


sufficient. Writing 
The multiplication law holds for events M Q.E.D., however, is 
| and H, therefore they are independent. Q.E.D. optional. 


EXPLORE 4.2 


In Worked example 4.8, we showed that the events M and H are independent. 


For the 60 students in that particular group, we could show by similar methods 
whether or not the three pairs of events M and H’, M’ and H and M’ and H’ are 
independent 


Do this then discuss what you believe would be the most appropriate conclusion 
106 to write. 


EXERCISE 4D 


1 Y and Z are independent events. P(Y)=0.7 and P(Z)=0.9. Find P(Y AZ). 


2 Two independent evenis are M and N. Given that P(7)=0.75 and 
P(M ON) =0.21, find P(N). 


3 Independent events S and T are such that P(S)=0.4 and P(7’)=0.2. Find: 
a P(SAT) b PASAT). 


4 A,B andC are independent events, and it is given that P( AM B) = 0.35, 
P(BOC)=0.56 and P(ANC)=0.4. 


a Express P(A) in terms of: 


i P(B) ii P(C). 
b Use your answers to part a to find: 
i P(B) ii P(A’) iii P(B’AC’). 


5 Ina class of 28 children, 19 aitend drama classes, 13 attend singing lessons, and 
six attend both drama classes and singing lessons. One child is chosen at random 
from the class. 


Event D is ‘a child who attends drama classes is chosen’. 


Event S is ‘a child who attends singing lessons is chosen’. 
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a Illustrate the data in an appropriate table or diagram. 


b Are events D and S independent? Give a reason for your answer. 


Each child in a group of 80 was asked 

whether they regularly read (R) or 80 R M 
regularly watch a movie (M ). The results 

are given in the Venn diagram opposite. 

One child is selected at random from the 

group. Event R is ‘a child who regularly 

reads is selected’ and event M is ‘a 18 
child who regularly watches a movie is 

selected’. 


Determine, with justification, whether events R and M are independent. 


Two fair 4-sided dice, both with faces marked 1, 2, 3 and 4, are rolled. 
Event A is ‘the sum of the numbers obtained is a prime number’. 

Event B is ‘the product of the numbers obtained is an even number’. 

a Find. in simplest form, the value of P(A), of P(B) and of P(AN B). 

b Determine, with justification, whether events A and B are independent. 


c Give a reason why events A and B are not mutually exclusive. 


Two ordinary fair dice are rolled. 
Event X is ‘the product of the two numbers obtained is odd’. 
Event Y is ‘the sum of the two nurmbers obtained is a multiple of 3’. 


a Determine, giving reasons for your answer, whether X and Y are 
independent. 


b Are events X and Y mutually exclusive? Justify your answer. 

A fair 8-sided die has faces marked 1, 2, 3, 4,5, 6, 7 and 8. The score when the die 
is rolled is the number on the face that the die lands on. The die is rolled twice. 
Event V is ‘one of the scores is exactly 4 less than the other score’. 

Event W is ‘the product of the scores is less than 13’. 

Determine whether events V and W are independent, justifying your answer. 
Two hundred children are categorised by gender and by whether or not they own 


a bicycle. Of the 108 males, 60 own a bicycle, and altogether 90 children do not 
own a bicycle. 


a Tabulate these data. 


b Determine, giving reasons for your answer, whether ownership of a bicycle is 
independent of gender for these 200 children. 


c What percentage of the females and what percentage of the males own 
bicycles? 


Explain how your answers to part c confirm the result obtained in part b. 
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(Ps) 11 At an election, it was found that people voted for Party X independently of their 
income group. The following table shows that 12400 people from three income 
groups voted altogether, and that 7440 of them voted for Party X. 


Find the value of a, of b and of c. 


(KS) 12 The speed limit at a motorway junction is 120km/h. Information about the 
speeds and directions in which 207 vehicles were being driven are shown in the 


following table. 
Ke) 
36 27 36 39 
15 15 18 i 21 We will study discrete 
7, random variables that 
Providing evidence to support your answer, determine which vehicles’ speeds arise from independent 
were independent of their direction of travel. events in Chapter 6. 


4.4 Conditional probability 
The word conditional is used to describe a probability that is dependent on some 
additional information given about an outcome or event. 


For example, if your friend randomly selects a letter from the word ACE, 
then P(selects E) = L 


However, if we are told that she selects a vowel, we now have a conditional probability that 
is not the same as P(selects E). 


This conditional probability is P(selects E, given that she selects a vowel) = > 
Conditional probabilities are usually written using the symbol | to mean giver that. 


We read P(A| B) as ‘the probability that A occurs, given that B occurs’. 


A child is selected at random from a group of 11 boys and nine girls, and one of the girls is called Rose. 
Find the probability that Rose is selected, given thai a girl is selected. 


Answer 


P(Rose is selected | a girl is selected) = 1 oecccce: The additional information, ‘given that a girl is 
2 selected’, reduces the number of possible selections 
from 20 to 9, and Rose is one of those nine girls. 
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WORKED EXAMPLE 4.10 È 


The following table shows the numbers of students in a class who study Biology (B) and who study Chemistry (C). 


Represent the data in a suitable Venn diagram, and find the 
probability that a randomly selected student: 

9 8 17 : : p : 

7 l 7 a studies Chemistry, given that they study Biology 

16 9 25 b does not study Biology, given that they do not study Chemistry. 


9 £ 
PE i “T © 16 study Biology and nine of these study Chemistry. 
a Eai 


i ets i ee en ae 


Po 


WORKED EXAMPLE 4.11 


Two children are selected at random from a group of five boys and seven girls. Find the probability that the 
second child selected 1s a boy, given that the first child selected is: 


a aboy b agirl. 
Answer 
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EXERCISE 4E 


1 


One letter is randomly selected from the six letters in the word RANANA. Find the probability that: 
a an Nis selected, given that an A is not selected 
b an A is selected, given that an N is not selected. 
One hundred children were each asked whether they have brothers 


(3) and whether they have sisters (S). Their responses are given in 
the Venn diagram opposite. 


Find the probability that a randomly selected child has: 


a sisters, given that they have brothers 


b brothers, given that they do not have sisters 

c sisters or brothers, given that they do not have both. 
Two photographs are randomly selected from a pack of 12 colour and eight black and white photographs. 
Find the probability that the second photograph selected is colour, given that the first is: 

a colour b black and white. 

The Venn diagram opposite shows the responses of 40 girls who were 


asked if they have an interest in a career in nursing (N), dentistry (D) 40 
or human rights (H). 


a Find the probability that a randomly selected girl has an interest in: 
i human rights, given that she has an interest in nursing 
ii nursing, given that she has no interest in dentistry. 


b Describe any group of gir!s for whom dentistry is the least popular 
career of interest. 


The quiz marks of 40 students are represented in the following bar chart. 


a 
= apus 


No. students 


Marks 


Two students are selecied at random from the group. Find the probability that the second student: 
a scored more than 5, given that the first student did not score more than 5 


b scored more than 7, given that the first student scored more than 7. 
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6 The histogram shown represents the times taken, | 
in minutes, for 115 men to complete a task. > 2+5 
Two men are selected at random from the group. 8 15 
Find the probability that the: S 
3 10 

a first man took less than | minute, given that he 2 
took less than 3minutes = 2 
b second man took less than 6minutes, given that 0 


1 2 3 4 5 6 7 8 9 10 


the first man took less than | minute. ; : 
Time taken (min) 


7 Atan insurance company, 60% of the staff are male (M) and 
70% work full-time (FT). The following Venn diagram shows 1 
this and one other piece of information. 


a What information is given by the value 0.10 in the Venn oe nee 
diagram? 

b Find the value of a, of b and of c. 

c An employee is randomly selected. Find: 
i PM FT) ii P(FT|M’) ii P[(M 0 FT)|(M U FT)] 


8 Two fair triangular spinners, both with sides marked 1, 2 and 3, are spun. Given that the sum of the two 
numbers spun is even, find the probability that the two numbers are the same. 


9 Two ordinary fair dice are rolled and the two numbers rolled are added together to give the score. Given that 
a player’s score is greater than 6, find the probability that it is not greater than 8. 


10 The circular archery target shown, on which 1, 2,3 or 5 points can be 
scored, is divided into four parts of unequal area by concentric circles. 
The radii of the circles are 3cm, 9cm, 15cm and 30cm. 


You may assume that a randomly fired arrow pierces just one of the 
four areas and is equally likely to pierce any part of the target. 


a Show that the probability of scoring 5 points is 0.01. 


b Find the probability of scoring 3 points, 2 points and | voint with 
an arrow. 


c Given that an arrow does not score 5 points, find the probability 
that it scores | point. 


d Given that a total score of 6 points is obtained with two randomly 
fired arrows, find the probability that neither arrow scores | point. 


Independence and conditional probability 

At the beginning of Section 4.3, ‘independent’ was described in quite familiar terms. 
In general, we can use the multiplication law given in Key point 4.8 as the definition of 
‘independent’. However, a more formal definition can now be given. 


Events X and Y are said to be independent if each is unaffected by the occurrence 
of the other. If this is the case then the probability that X occurs is the same in two 
complementary situations: 


(i) when Y occurs, and (ii) when Y does not occur. 
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From these, we can now say that X and Y are independent if and only if P(X |Y) = P(X |Y’). 


Consider rolling an ordinary fair die and the events X and Y, as defined below. 

X: the outcome is a square number (1 or 4). 

Y: the outcome is an odd number (1, 3 or 5). 

First note that 1 is odd and square, so ¥ and Y are not mutually exciusive; but are they 
independent? 


When Y occurs, the die shows 1, 3 or 5, so PLY | Y) = 1 l 
= i P(X)= 7 whether Y occurs or not. 


When Y does not occur, the die shows 2, 4 or 6, so PLY | Y’) = 3 


P(X |Y)= P(X |Y’) means that P(X) is unaffected by the occurrence of Y. 
Events X and Y are not mutually exclusive, but they are independent. 


EXPLORE 4.3 


1 Consider rolling an ordinary fair die. In each case below, determine whether the 
given events are mutually exclusive, and whether they are independent. 
a ‘X:a number less than 3’, and ‘Y: an even number’. 
b ‘4: a number that is 4 or more’, and ‘B: a number that is not more than 4’. 


2 Now consider rolling a fair 12-sided die, numbered from 1 to 12. In each case 
below, determine whether the given events are mutually exclusive, and whether 
they are independent. 

a ‘X: an even number’, and ‘Y: a factor of 28”. 
b ‘A: a prime number’, and ‘B: a multiple of 3’. 
c ‘F:a factor of 12’, and ‘M: a multiple of 5’. 


3 An integer from 1 and 20 inclusive is selected at random. Three events are defined 
as follows: 
A: the number is a mu!tiple of 3. 
B: the number is a “actor of 72. 
C: the number has exactly two digits, and at least one of those digits is a 1. 
Determine which of the three possible pairs of these events is independent. 


4.5 Dependent events and conditional probability 

Two events are mutually dependent when neither can occur without being affected by 

the occurrence of the other. An example of this is when we make selections without 
replacement; that is, when probabilities for the second selection depend on the outcome of 
the first selection. 


The multiplication law for independent events (see Key point 4.8) is a special case of the 
multiplication law of probability 


The multiplication law of probability is used to find the probability that ‘this and that’ 
occurs when the events involved might not be independent. 
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We D EXAMPLE 4.12 


Two children are randomly selected from 11 boys (B) and 14 girls (G). Find the probability that the selection 
consists of: 


a two boys b a boy anda girl, in any order. 


Answer 
a P(2boys) = P(Band 5) 
= P(B)) x P(B2| Bi) 


M 10 »@®eeeee eee eaeoeee 
= — X — 

25 24 
_u 

60 


b P(a boy and a girl) =P(B and G)+P(G and B) 
= P(B,) x P(Gy| Bi) + P(G,) x P(By |G) se eee 


& A & a) 
| = X + x 
25 24 25 24 


Pei 
150 


WORKED EXAMPLE 4.13 A 


Every Saturday, a man invites his sister to the theatre or to the cinema. 70% of his invitations are to the theatre 
and 90% of these are accepted. His sister rejects 40% of his invitations to the cinema. 


Find the probability that the brother’s invitation is accepted on any particular Saturday. 


Answer 
The probability that the sister accepts depends on where she is invited to go. This tells us that our calculations 
must involve conditional probabilities. 


0.9 accepts .... P(T and accepts) = 0.7 x 0.9 = 0.63 
ee 
07 T = l 
0.1 rejects P(T and rejects) = 0.7 x 0.1 = 0.07 


0.6 accepts .... P(C and accepts) = 0.3 x 0.6 = 0.18 
0.3 
pa 
0. rejects P(C and rejects) = 0.3 x 0.4 = 0.12 


P(accepts) =P(T and accepts) + P(C and accepts) 
= 0.63+ 0.18 peeeeseesn ov epeeeeeenes 


=0.81 
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We can use the multiplication law of probability to find conditional probabilities. 


We know that P(A A^ B)= P(A) P(B| A), so P(B | A) can be found when P( 4^ B) and 
P(A) are known. 


The symbol = means ‘is 
RUND amd ECB EA aAA : panne a 


BOD = =e) P(B) 


WORKED EXAMPLE 4.14 


Given that P(AM B)=0.36 and P(B) = 0.9, find P(A|B). 


Answer 

P(Bo A) 
P(B) 
P(A B) 

“PŒ 


P(4|B)= 


\ WORKED EXAMPLE 4.15 


An ordinary fair die is rolled. Find the probability that the number obtained is 
prime, given that it is odd. 
Answer 


Pimimejodds P(odd and prime) 


The odd numbers are 1,3 and 5, so 


P(odd) 
5 P(odd) = 3 The odd prime numbers Alternatively, tases 
-AN 6 2 the three odd numbers 
6 6 are 3 and 5, so P(odd aud prime) = A on a die are prime, so 
2 2 
le P(prime | odd) = 3° 


NORKED EXAMPLE 4.16 
| C. 


A boy walks to school (W) 60% of the time and cycles (C) 40% of the time. He is 
late to school (L), on 5% of the occasions that he walks, and he is late on 2% of the 
occasions that he cycles. 


Given that he is late to school, find the probability that he cycles; that is, find 
P(C | L). 
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Answer 
0.05 L nrn P(W and L) = 0.030 
w TEIT) [ ree d 
0.6 OY 
0.95 7 L! ..... POV and L’) = 0.570 N 
0.02 L ..... P(Cand L) = 0.008 
0.4 


| 0.98 E ia P(C and L') = 0.392 


P(L)= P(W and L) + P(Cand L) 
= 0.030 + 0.008 
= 0.038 
P(Cand L) 
P(L) 
0.008 
~ 0.038 


eeeceeeeeees 


P(C|L)= 


-= or 0.211 


EXERCISE 4F 


1 Two ties are taken at random from a bag of three plain and five striped ties. By 
use of a tree diagram, or otherwise, find the probability that both ties are: 


a plain b striped. 


2 There are four toffee sweets and seven nutty sweets in a girl’s pocket. Find the 
probability that two sweets, selected at random, one after the other, are not the 
same type. 


3 Ona library sheif there are seven novels, three dictionaries and two atlases. 
Two books are randomly selected without replacement from these. Find the 
probability that the selected books are: 


a both novels b both dictionaries or both atlases. 
4 A woman travels to work by bicycle 70% of the time and by scooter 30% of the 


time. If she uses her bicycle she is late 3% of the time but if she uses her scooter 
she is late only 2% of the time. 


a Find the probability that the woman is late for work on any particular day. 
b Given that the woman expects not to be late on approximately 223 days ina 
year, find the number of days in a year on which she works. 
5 Two children are randomly selected from a group of five boys and seven girls. 
Determine which is more likely to be selected: 
a two boys or two girls? 


b the two youngest girls or the two oldest boys? 
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6 A boy has five different pairs of shoes mixed up under his bed. Find the 
probability that when he selects two shoes at random they can be worn as a 
matching pair. 


7 A bag contains five 4cm nails, six 7cm nails and nine 10cm nails. Find the 
probability that two randomly selected nails: 
a havea total length of 14cm 
b are both 7cm long, given that they have a total length of 14cm. 

8 Yvonne and Novac play two games of tennis every Saturday. Yvonne has a 65% 
chance of winning the first game and, if she wins it, her chances of winning 
the second game increase to 70%. However, if she loses the first game, then her 


chances of winning the second game decrease to 55%. Find the probability that 
Yvonne: 


a loses the second game 


b wins the first game, given that she loses the second game. 


9 a Given P(X NY)=0.13 and P(X) = 0.65, find P(Y | X). 
b Given P(MAN)=0.27 and P(N|M)=0.81, find P(M). 
c Given PV AW)=0.35 and P(W’)=0.60, find POY |W). 
10 When a customer at a furniture store makes a purchase, there is a 15% chance that 


116 they purchase a bed. Given that 4.2% of all customers at the store purchase a bed, 
find the probability that a customer does not make a purchase at the store. 


11 A number between 10 and 100 inclusive is selected at random. Find the probability 
that the number is a multiple of 5, given that none of its digits is a 5. 


12 Three of Mr Jumbillo’s seven children, who include one set of twins, are selected 
at random. 
a Calculate the probability that exactly one of the twins is selected 
b Given that exactly three of the children are girls, find the probability that the 


selection of three children contains more girls than boys. 


13 Anya calls Zara once each 0.74 Answers 


evening before she goes to A 
: . Calls mobile 
bed. She calls Zara’s mobile 0.8 
Does not answer 


phone with probability 0.8 or 
her landline. The probability 


that Zara answers her mobile y Answers 
phone is 0.74, and the N Calls landline < 
probability that she answers 


Does not answer 
her landline is y. This 
information is displayed in 
the tree diagram shown. 


a Given that Zara answers 68% of Anya’s calls, find the value of y. 


b Given that Anya’s call is not answered, find the probability that it is made to 
Zara’s landline. 
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14 Every Friday, Arif offers to take his Accept 
sons to the beach or to the park. The Beach 
sons refuse an offer to the beach with x 
0.65 Refuse 


probability 0.65 and accept an offer 
to the park with probability 0.85. The 
probability that Arif offers to take them to 0.85_— Accept 
the beach is x. This information is shown Park < 


in the tree diagram. Refuse 


a Find the value of x, given that 33% of 
Arif’s offers are refused. 


b Given that Arif’s offer is accepted, find the probability that he offers to take 
his sons to the park. 


(Ps) 15 Two children are selected from a group in which there are 10 more boys than 
girls. Given that there are 756 equiprobable ordered selections that can occur, 
find the probability that two boys or two girls are selected. 


(Ps) 16 There is a 43% chance that Riya meets her friend Jasmine when she travels to work. 
Given that Riya walks to work and does not meet Jasmine 30% of the time, and that 
she travels to work by a different method and meets Jasmine 25% of the time, find 
the probability that Riya walks to work on any particular day. 


We will study 
further techniques 
for calculating 


(Ps) 17 Aaliyah buys a randomly selected magazine that contains a crossword puzzle on 
five randomly chosen days of each week. On 84% of the occasions that she buys a probabilities using 
magazine, she attempts its crossword, which she manages to complete 60% of the permutations and 
time. Find the probability that, on any particular day, Aaliyah does not complete combinations in 
the crossword in a magazine. Chapter 5, Section 5.4. 


Probabilities are assigned on a scale from 0 (impossible) to | (certain). 
When one object is randomly selected from n objects, P(selecting any particular object) = 1 


Number of favourable equally likely outcomes 
Total number of equally likely outcomes 


P(A) + P(not A) =1 or P(4)+ P(4’)=1 


P(event) = 


In n trials, event A is expected to occur n x P(A) times. 

AU B means ‘A or B’ and Aq B means ‘A and B’. 

Mutually exclusive events have no common favourable outcomes. 

For mutually exclusive events A and B, P(A or 5) = P(AU B) = P(A) + P(B) 
Non-mutually exclusive events have at least one common favourable outcome. 

For any two events A and B, P(A or 8) = P(AU B) = P(A) + P(B) —- P(AN B) 
Independent events can occur without being affected by the occurrence of each other. 


Events A and B are independent if and only if P(Aand B) = P(A B) = P(A) x P(B), such that 
P(A | B) = P(A | B^). 


For any two events 4 and B, P( Aand B) = P(A^\ B) = P(A) x P(B| A) and P(B| A) = 


P(ANB) 
P(A) ` 
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Four quality-control officers were asked to test 1214 randomly selected electronic components from a company’s 
production line, and to report the proportions that they found to be defective. The proportions reported were: 
2 ASO and 

332° 411° 187 283° 

These figures confirm what the manager thought; that about k% of the components produced are defective. 


Of the 7150 components that will be produced next month, approximately how many does the manager expect 
to be defective? [2] 


Three referees are needed at an international tournament and there are 12 to choose from: three from Bosnia, 
four from Chad and five from Denmark. If the referees are selected at random, find the probability that at 
least two of them are from the same country. [3] 


The diagram opposite gives details about a company’s 115 Part-time Full-time 
employees. For examp!e, it employs four unqualified, part-time 
females. Male 


Unqualified 


Two employees are selected at random. Find the probability that: 


a one isa qualified male and the other is an unqualified female [2] Female 


b both are unqualified, given that neither is employed part-time. [3] 
The numbers of books read in the past 3 months by the members of a reading club are shown in the following table. 


<2 2-4 5-7 8-9 10 >10 
3 S 22 7 2 1 


Find the probability that three randomly selected members have all read fewer than eight books, given that 
they have all read more than four books. [3] 


When they are switched on, certain small devices independently produce outputs of 1,2 or 3volts with respective 
probabilities of 0.3, 0.6 and 0.1. Find the probability that three of these devices produce an output with a sum 
of 5 or 6 volts. [4] 


One hundred people are attending a conference. The following Venn diagram shows how many are male (M), 
have brown eyes (BE) and are right-handed (RH). 


55 


87 


a Given that there are 43 males with brown eyes, 42 right-handed males aud 46 right-handed people with 
brown eyes, copy and complete the Venn diagram. [3] 


b Two attendees are selected at random. Find the probability that: 
i they are both females who are not right-handed [2] 


ii exactly one of them is right-handed, given that neither of them have brown eyes. [3] 
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A student travels to college by either of two routes, A or B. The 


x Bus 
probability that they use route A is 0.3, and the probability that they are F 
passed by a bus on their way to college on any particular day is 0.034. 0.3 


, : N 
They are twice as likely to be passed by a bus when they use route B = 
as when they use route A. 
: 2 : y Bus 
a Use the tree diagram opposite to form and solve a pair of 0.7 $ 
simultaneous equations in x and y. [3] 
No bus 


b Find the probability that the student uses route B, given that they are 
not passed by a bus on their way to college. [3] 


Two ordinary fair dice are rolied. If the first shows a number less than 3, then ihe score is the mean of the 
numbers obtained; otherwise the score is equal to half the absolute (non-negative) difference between the 
numbers obtained. Find the probability that the score is: 


a positive [3] 
b greater than 1, given that it is less than 2 [3] 
c less than 2, given that it is greater than 1. [3] 


Three friends, Rick, Brenda and Ali, go to a football match but forget to say which entrance to the ground they 
will meet at. There are four entrances, A, B,C and D. Each friend chooses an entrance independently. 


e The probability that Rick chooses entrance A is L. The probabilities that he chooses entrances B, C or D 
are all equal. 


e Brenda is equally likely to choose any of the four entrances. 


e The probability that Ali chooses entrance C is Z and the probability that he chooses entrance D is 2. 


The probabilities that he chooses the other two entrances are equal. 


Find the probability that at least 2 friends will choose entrance B. [4] 
ii Find the probability that the three friends will all choose the same entrance. [4] 
Cambridge International AS & A Levei Mathematics 9709 Paper 61 Q5 November 2010 


Maria chooses toast for her breakfast with probability 0.85. 1f she does not choose toast then she has a bread 
roll. If she chooses toast then the probability that she will have jam on it is 0.8. If she has a bread roll then the 
probability that she will have jam on it is 0.4. 


i Draw a fully labelled tree diagram to show this information. [2] 
ii Given that Maria did not have jam for breakfast, find the probability that she had toast. [4] 
Cambridge International AS & A Level Mathematics 9709 Paper 62 Q3 November 2009 


Ronnie obtained data about the gross domestic product (GDP) and the birth rate for 170 countries. He 
classified each GDP and each birth rate as either ‘low’, ‘medium’ or ‘high’. The table shows the number of 
countries in each category. 
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12 


13 
14 


16 


One of these countries is chosen at random. 
i Find the probability that the country chosen has a medium GDP. [1] 


ii ind the probability that the country chosen has a low birth rate, given that it does not have a 
medium GDP. [2] 


iii State with a reason whether or not the events ‘the country chosen has a high GDP’ and ‘the country 
chosen has a high birth rate’ are exclusive. [2] 


One country is chosen at random from those countries which have a medium GDP and then a different 
country is chosen at random from those which have a medium birth rate. 


iv Find the probability that both countries chosen have a medium GDP and a medium birth rate. [3] 
Cambridge International AS & A Level Mathematics 9709 Paper 63 Q3 November 2012 


Three boxes, A, B and ©, each contain orange balls and blue balls, as shown. 


box B box C 


a A girl selects a ball at random from a randomly selected box. Given that she selects a blue ball, find the 


probability that it is from box C. [3] 
b A boy randomly selects one ball from each box. Given that he selects exactly one blue ball, find the 

probability that it is from box A. [4] 
A and B are independent events. If P(A)=0.45 and P(B)=0.64, find P[(A U B)’l. [2] 


Two ordinary fair dice are rolled. 
Event A is ‘the sum of the numbers rolled is 2, 3 or 4’. 


Event B is ‘the absolute difference between the numbers rolled is 2, 3 or 4’. 


a When the dice are rolled, they show a 1 and a 3. Explain why this result shows that events A and B are not 


mutualiy exclusive. [1] 
b Explain how you know that P( AN B)= = [2] 
c Determine whether events A and B are independent. [3] 


A and B are independent events, where P(.47) 6’)=0.14, P(A’ B)=0.39 and 


P(A B)<0.25. Use a Venn diagram, or otherwise, to find P[(A U BY]. [4] 


In a survey, adults were asked to answer yes or no to the question ‘Do 
you regularly watch the evening TV news?’ Some of the results from the Male No 
survey are detailed in the Venn diagram opposite. | 


One adult is selected at random and it is found that the events ‘a 

female is selected’ and ‘a person who regularly watches the evening 

TV news is selected’ are independent. Find the number of adults 105 
questioned in the survey. [4] 
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17 Bookings made at a hotel include a room plus any meal combination 
of breakfast (B), lunch (L) and supper (S). The Venn diagram opposite 
shows the number of each type of booking made by 71 guests on Friday. 
a A guest who has not booked all three meals is selected at random. 
Find the probability that this guest: 


i has booked breakfast or supper [2] 
ii has not booked supper, given that they have booked lunch. [2] 


b Find the probability that two randomly selected guests have both 
booked lunch, given that they have both booked at least two meals. [3] 


(Ps) 18 Three strangers meet on a train. Assuming that a person is equally likely to be born in any of the 
12 months of the year, find the probability that at least two of these three people were born in the 
same month of the year. [4] 


(Ps) 19 A box contains three black and four white chess pieces. Find the probability that a random selection of five 
chess pieces, taken one at a time without replacement, contains exactly two black pieces which are selected 
one immediately after the other. [4] 


(Ps) 20 The following table shows the numbers of IGCSE (I) and A Level (A) examinations passed by a group of 
university students. 


a Fora student selected at random, find: 
i PO+A=11|A<4) [2] 
ii PU-A>5|1+A> 10) [3] 


b Six students who all have at least three A Level passes are selected at random. Find the greatest possible 
range of the total number of IGCSE passes that they could have. [2] 
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In this chapter you will iearn how to: 


solve simple problems involving selections 

m solve problems about arrangements of objects in a line, including those involving repetition and 
restriction 
evaluate probabilities by calculation using permutations or combinations. 


Chapter 5: Permutations and combinations 


Where it comes from What you should be able to ao Check your skills 


Chapter 4, Section 4.4. Recognise and calculate conditional | An experiment has 10 equally likely 
probabilities. outcomes: three are favourable to 
event A, five are favourable to event 
B and four are favourable to neither 
Anor B. 

Find P(A | B) and P(B | A). 


Simple situations with millions of possibilities 

This topic is concerned with selections and arrangements of objects. Permutations and 
combinations appear in many complex modern applications: transport logistics; relationships 
between proteins in genetic engineering; sorting algorithms in computer science; and 
protecting computer passwords and e-commerce transactions in cryptography. 


A selection of objecis is called a combination if the order of selection does not matter; 
however, if the order of selection does matter, then the selection is called a permutation. 


To make the difference between a permutation and a combination clear, we can describe 
them as follows. 


e A combination is a way of selecting objects. 


There are three combinations of two letters from A, B and C. These are A and B, A and C, 
and Band C. 

e A permutation is a way of selecting objecis and arranging them in a particular order. 
There are six permutations of two letters from A, B and C. These are 
AB, BA, AC, CA, BC and CB. 


Each letter A to Z is encrypted 
(or transformed) to a fixed distinct 
letter using its position in the alphabet 
(A =1,B=2, C =3....). By doing this, 
the password SATURN is encrypted as 
ECHKBP. 


| This information gives us six clues to 
work out the method of encryption 
(e.g. S— E means 19 —> 5, A —> C means 


1> 3, and so on). An Enigma Encryption Machine, circa 1940. 


Investigate the method of encryption and then find the password that is encrypted as 
UJSNOL. 


There are over 27 million possible passwords, so the probability that a random guess 
is correct is approximately zero. 
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5.1 The factorial function 
In this chapter, we will frequently need to write and evaluate expressions such as 4x 3x2 x1. 


A shorthand method of doing this is to use the factorial function: 4 x 3 x 2 x Lis called ‘four 
factorial’ and is written 4! On most calculators, the factorial function appears as n! or x! 


As examples: 
7T!=7x6x5x4x3x2x1=5040 


6! 6x5x4x3x2x1 
ays aa 


5!—41= 45-1) =4!x4=96 


3! 3x2xil 
— = =6 
0! 1 


| ni=n(n—1)(n—2)...x3x2™1, for any integer n>0. O!=1 


The following figure shows values of n! in sequence, where the next term is obtained by 
division. From this it is clear that 0! must be equal to 1. 


2! 1! 0! 
+3 +2 +1 
> > > 
2 1 1 
1 Without using a calculator, find the value of: 
5! 4! 10! 9! 20! 13! 
31 b a c 7x4!4+21~x3! d Z 7 e Tg! it 
2 Use your calculator to find the smallest value of n for which: 
a n!>i000000 b 5!x6!<n! c (n!)!>107 
3 Use your calculator to find the largest value of n for which: 
n! n! 
—— <80 b 1.5x10!?-n!>0 c ——<500 
500000 i (n—2)! 


! 


. ; ; ; 1x 
4 Express, in as many different ways as possible, the numbers 144, 252 and 14 in the form Å , where none of 


a,b or c is equal to 0 or tol. 
5 Express the area of a 53cm by 52cm rectangle using factorials. 


6 Two cubical boxes measure 25cm by 24cm by 23cm, and 8cm by 7cm by 6cm. Express the difference between 
their volumes using factorials. 


7 Eight children cach have seven boxes of six eggs and each egg is worth $0.09. Write the total value of all these 
eggs in dollars, using factorials. 


Copyright Material - Review Only - Not for Redistribution 


Chapter 5: Permutations and combinations 


There is a famous legend about the Grand 
Vizier in Persia who invented chess. 


The King was so delighted with the new 
game that he invited the Vizier to name 
his own reward. The Vizier replied that, 
being a modest man, he wanted only one 
grain ot wheat on the first square of a 
chessboard, two grains on the second, 
four on the third, and so on, with twice 

as many grains on each square as on the 
previous square. The innumerate King 
agreed, not realising that the total number 
of grains on all 64 squares would be 2™ — 1, or 1.84 10!°, which is equivalent to the world’s present 
wheat production for the next 150 years. 


Although the number 2% — 1 is extremely large, it is only about one-third of 21 factorial. 


As a challenge, try showing thai 2° + 2! + 2? +... +263 =2% — 1 without using a formula. 


5.2 Permutations 
We can make a permutation by taking a number of objects and arranging them in a line. 
For example, the two possible permutations of the digits 5 and 9 are the numbers 59 and 95. 


Although there are several methods that we can use to find the number of possible 
permutations of objects, all methods involve use of the factorial function. 


Permutations of n distinct objects 

The number of permutations of n distinct objects is denoted by “P,, and there are n! 
permutations that can be made. For example, there are * P, = 2!=2 permutations of ihe two 
digits 5 and 9, as we have just seen. 


(9) KEY POINT 5.2 


The number of permutations of n distinct objects is "P, =n! =n(n—-1)(n—2)...x3x2 x1, for any 
integer n>0. 


Consider all the three-digit numbers that can be made by arranging the digits 5, 6 and 7. 
In this simple case, we can make a list to show there are six possible three-digit numbers. 
These are 567, 576, 657, 675, 756 and 765. 


The following tree diagram gives another method of showing the six possible arrangements 


of the three digits. 
left middle right number 

; s 6 T iris sa 567 

T Oi aissis 576 

5 T iesvnsaviees 657 
s 

7 T, 675 

5 Gneis 756 
p 

6 AARTE 765 
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Unfortunately, writing out lists and constructing tree diagrams to find numbers of possible 
arrangements of objects are suitable methods only for small numbers of objects. Imagine 
listing all the possible arrangements of seven different letters; there would be over 5000 on 
the list and a tree diagram would have over 5000 branches at its right-hand side! 


Clearly, a more practical method for finding numbers of arrangements is needed. This is 
the primary use of the factorial function. 


We can show that six three-digit numbers can be made from 5, 6 and 7 by considering how many 
choices we have for the digit that we place in each position in the arrangement. If we first place 
a digit ai the left side, we have three choices. Next, we place a digit in the middle (two choices). 
Finally, we place the remaining digit at the right side, as shown in the following diagram. 
3 x 2 x 1 
choices choices choices 


The numbers above the lines in the diagram are not the digits that are being 
arranged — they are the numbers of choices that we have for placing the three digits. 


The three digits can be arranged in 3 x 2 x 1 =3!= °R = 6 ways. 


The seven letters mentioned previously can be arranged in 
7x6x5x4x3x2x1=7!=5040 ways. 


In how many different ways can five boys be arranged in a row? 


126 
A 
| aswer We could just as 
We multiply together the number of easily work from 
5x4x3x2x1=5!=°R =120 ways. choices for each of the five positions, Heb tio lett, Ding 


1x2x3x4x5=S! 
ways. 


working from left to right. 


WORKED EXAMPLE 5.2, 


In how many ways can nine elephants and four mice be arranged in a line? 


Answer 


Large number answers 
can be given more 
accurately than to 

3 significant figures. 


The nine elephants and four mice are 
13!= 34 = 6227020800 ways. distinct, so we are arranging 13 different 
animals. 
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EXERCISE 5B 


1 In how many ways can the six letters A, B, C, D, E and F be arranged in a row? 


2 From a standard deck of 52 playing cards, find how many ways there are of arranging in a row: 


a all 52 cards b the four kings c the 13 diamonds. 


2 In how many different ways can the following stand in a line? 


a two women b six men c eight adults. 


4 Inhow many different ways can the following sit in a row on a bench? 


a four girls b three boys c four girls and three boys. 


5 Seven cars and x vans can be parked in a line in 39 916 800 ways. Find the number of ways in which five cars 
and x + 2 vans can be parked in a line. 


6 A woman has 10 children. She arranges 11 chairs in a row and sits on the chair in the middle. If her youngest 
child sits on the adjacent chair to her left, in how many ways can the remaining children be seated? 


(Ps] 7 A group ofn boys can be arranged in a line in a certain number of ways. By adding two more boys to the 
group, the number of possible arrangements increases by a factor of 420. Find the value of n. 


Permutations of n objects with repetitions 
When n objects include repetitions (i.e. when they are not all distinct), there will be fewer 
than ”P, permutations, so an adjustment to the use of the factorial function is needed. 


Consider making five-letter arrangements with A, A, B, C and D. 

To simplify the problem, we can distinguish the repeated A by writing the letters as 
A, A, B, Cand D. 

AABCD is the same arrangement as AABCD. 

DACABis the same arrangement as DACAB, and so on. 

Each time we swap A and A, we obtain the same arrangement. 


If the five letters were distinct, there would be * P; = 5!=120 arrangements, but the number 


: 1 
is reduced to half (=) of the total arrangements because the two repeated letters can be 


placed in 2! ways in any particular arrangement without changing that arrangement. 


Letters to be arranged=5; number of the same letter = 2. 


5 
P, ! 

There are z = = = 60 five-letter arrangements that can be made. 
2 : 


O) KEY POINT 5.3 


The number of permutations of 7 objects, of which p are of one type, q are of another type, r are 
P S n! 
p!xq!xr!x ... p!xq!Xr!x .. 


of another type, and so on, is , where p+q +r+...=n. 


Copyright Material - Review Only - Not for Redistribution 


X 


N 
Cambridge International AS & A Levelgfathematics: Probability & Statistics 1 


WORKED EXAMPLE 5.3 


The capital of Burkina Faso is OUAGADOUGOU. Find the number of distinct arrangements of all the letters 
in this word. 


Answer 


11! 


NA = - 11 letters are to be arranged, with repeats of three Os, three Us, two As and two 
3!x 3! x 2! x 2! 


Gs. In the formula cf Key point 5.3, excluding 1! for the D in the denominator 
does not change our answer. 


EXERCISE 5C 


1 Find the number of distinct arrangements of all the letters in these words: 
a TABLE b TABLET c COMMITTEE 
d MISSISSIPPI e HULLABALLOO. 


2 Find how many six-digit numbers can be made from these sets of digits: 
a 1,i,1,1, land 3 b 2,2,2,7,7and7 c 5,6,6,6,7and7 d 8,8,9,9,9 and9. 
3 A girl has 20 plastic squares. There are five identical red squares, seven identical blue squares and eight 


identical green squares. By placing them in a row, joined edge-to-edge, find how many different arrangements 
she can make using: 


128 


a one square of each colour b the five red squares only 


call of the blue and green squares d all of the 20 squares. 


4 Two students are asked to find how many ways there are to plant two trees and three bushes in a row. The first 


=10. Decide who you agree with and explain the error made 


f f 5! 
student gives 5! = 120, and the second gives TEET 


by the other student. 


5 Ten coins are placed in a row on a table, each showing a head or a tail. 
a How many different arrangements of heads and/or tails are possible? 
b Ofthe arrangements in part a, find how many have: 
i five heads and five tails showing ii more heads than tails showing. 
6 There are 420 possible arrangements of all the letters in a particular seven-letter word. Give a description of 
the letters in this word. 
7 Find the number of distinct five-letter arrangements that can be made from: 
a two As and three Bs b two identical vowels and three Bs 


c two identical vowels and any three identical consonants. 
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Consider the number of distinct arrangements of the 16 letters in the word 
COUNTERCLOCKWISE - there are close to 8.72 x 10!!. We rarely meet such 
numbers in our daily lives, so we are likely to see this as just a very large number 
whose true size we cannot really comprehend until it is put into some human context. 
For example, if everyone on Earth over the age of 14 (i.e. about 5.46 x 10° people) 
contributed one new arrangement of the word every day starting on Ist January, we 
would complete the list of arrangements around 9th June. 

8.72 x 10!! 
5.46 x 10° 
Devise a way of expressing the number of distinct arrangements of the letters in the 
word PNEUMONOULTRAMICROSCOPICSILICOVOLCANOCONIOSIS (which 
is the full name for the disease known as silicosis, and is the longest word in any major 
English language dictionary) in a way that is meaningful to human understanding. 


The calculation for this is = 160 days. 


For other options, 
There are, for example, 3.15 x 10’seconds in a year, about 7.48 x10° people on perform a web 


Earth, and the masses of the Earth and Sun are 5.97 x 104 and 1.99 x 10% kg, search for large 
respectively. numbers. 


Permutations of n distinct objects with restrictions 

The number of possible arrangements of objects is reduced when restrictions are put in place. 
As a general rule, the number of choices for the restricted positions should be investigated first, 
and then the unrestricted positions can be attended to. 


WORKED EXAMPLE 5.4 


Find the number of ways of arranging six men in a line so that: 
a the oldest man is at the far-left side 
b the two youngest men are at the far-right side 


c the shortest man is at neither end of the line. 


Answer 


Without restrictions, the six men can be arranged in ° P, = 6! = 720 ways. So, with restrictions, there will be fewer 
than 720 arrangements. 


a 1x°P=1x5!=120 arrangements peccee 


4P, x ?P, = 4!x 2! =48 arrangements 
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There are only íve men who can be placed at the 
far-left side, so there are only four men who can be 
placed at ihe far right. The remaining four positions 
tP can be £lled by any of the other four men (one of 
whom is the shortest man) in 4P, ways, as shown. 


c 2X4 X 3px 1 x4 


5x +P, x4=5x4!x 4= 480 arrangements. 


Alternatively, the shortest man can be placed in 
one of four positions, and the other five positions 
can be filled in ŽP; ways, so 4 x 5! = 480 


WORKED EXAMPLE 5.5 


How many odd four-digit numbers greater than 3000 can be made from the digits 
1,2, 3 and 4, each used once? 


Answer 


Restrictions affect the digits in the thousands column and in the units column. 

The digit at the far left (i.e. thousands column) can be only 3 or 4, and the digit at 
the far right (i.e. units column) can be only 1 or 3. The 3 can be placed in either of the 
restricted positions, so we can investigate separately the four-digit numbers that start 


with 3, and the four-digit numbers that start with 4. Alternatively, we could 
130 Rxx] solve this problem by 
= Start with 3: We must place | at the far right (one investigating separately 
2P choice), and the remaining two positions can be the numbers that end 
. igits in 2 ith 1, and the numb 
1x 2P, x 1=2 numbers filled by the other two digits in 4P, ways, as shown. i E ers 
IXxX2Zx1x2 Start with 4: We can place | or 3 at the far right (twe 
SS : Ps enue 
2P, choices), and the remaining two positions can be 


Wine: . 2 
ie? d= 4 cumes filled by the other two digits in 4P, ways, as shown. 


These six numbers are 
2+4=6 odd numbers greater 3241, 3421, 4123, 4213, 


than 3000 can be made 4231 and 4321. 


WORKED EXAMPLE 5.6 


Find how many ways two mangoes (M ) and three watermelons (W ) can be placed in 
a line if the five fruits are distinguishable and the mangoes: 


a must not be separated b must be separated 
Answer 
a P The two mangoes can be placed next to 
M M, W, W, W, each other in ?P, ways. This pair is now 
=a \ considered as a single object to be arranged Objects that must 
1 object a with the three watermelons, giving a total not be separ ated are 
4p, of four objects to arrange, 2s shown. treated as a single 


object when arranged 
with others. 


2 P, x 4P, = 48 ways 
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b 120 — 48 = 72 ways coocoo 


EXERCISE 5D 


1 Find how many five-digit numbers can be made using the digits 2, 3, 4, 5 and 6 once each if: 


a there are no restrictions 
b the five-digit number must be: 


i odd ii even iii odd and less than 40000. 


2 Find how many ways four men and two women can stand in a line if: 
a the two women must be at the front 
b there must be a woman at the front and a man at the back 
c the two women must be separated 
d the four men must not be separated 


e no two men may stand next to each other. 


3 Find the ratio of odd-to-even six-digit numbers that can be made using the digits 1, 2, 3, 4,5 and 7. 


4 Find how many ways 10 books can be arranged in a row on a shelf if: 
a the two oldest books must be in the middle two positions 
b the three newest books must not be separated. 
5 Five cows and one sei of twin calves can be housed separately in a row of seven stalls in 7P;=5040 ways. Find 
in how many of these arrangements: 
a the two caives are not in adjacent stalls 
b the iwo calves and their mother, who is one of the 5 cows, are in adjacent stalls 


c each calf is ina stall adjacent to its mother. 


6 Find how many of the six-digit numbers that can be made from 1, 2, 2, 3, 3 and 3: 


a begin witha 2 b are not divisible by 2. 


7 Find the number of distinct arrangements that can be made from all the letters in the word THEATRE when 


the arrangement: 


a begins with two Ts and ends with two Es b has Has its middle letter 
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c ends with the three vowels E, A and E. 


(Ps) 8 The following diagram shows a row of post boxes with the 
owners’ names beneath. 


Five parcels, one for the owner of each box, have arrived at 
the post office. If one parcel is randomly placed in each box, 
find the number of ways in which: 
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a the five parcels can all be placed in the correct boxes 

b exactly one parceli can be placed in the wrong box 

c the correct parcels can be placed in Mr A’s and one other person’s box only 
d exactly two parcels can be placed in the correct boxes. 


Q 9 There are x boys and y girls to be arranged in a line. Find the relationship between x and y if it is not possible 
to separate all the boys. 


Permutations of r objects from n objects 

So far, we have dealt only with permutations tn which all of the objects are selected and 
arranged. We can now take this a step further and look at permutations in which only 
some of the objects are selected and arranged. When we select and arrange r objects in a 
particular order from n distinct objects, we call this a permutation of r from n. 


Suppose, for example, we wish to select and arrange three letters from the five letters 


A,B,C, D and E. ane ED 
: . ” (n=r)! 
We have five choices for the first letter, four for the second, and three choices for the third. permutations of r 
This gives us a total of 5 x 4 x 3 = 60 permutations, which is effectively 5! but 2! are missing. objects from n distinct 
5! 5! ' jects. 
There are —— — = 60 permutations altogether. objects 


(5-3)! 2! 


132 


How many three-digit numbers can be made from the seven digits 3, 4, 5, 6, 7,8 and 9, 
if each is used at most once? 


Answer 
! 
1P = =i 
7 (7-3)! 
adl We select and arrange just three of the seven maa i 
4! - distinct digits (and ignore four of them). Thechoices we have 
=7x6x5 for the first, second 


and third digits are 


= 210 three-digit numbers 7x6x5=210 


In how many ways can five playing cards from a standard deck of 52 cards be 
‘arranged in a row? 


Answer 
2p 52! 7 rae 
5= 47 We select and arrange five of the 52 playing cards The choices we have are 
= 311875 200 ways (and ignore 47 of them). eee 
= 311875200. 
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WORKED EXAMPLE 5.9, 


=~ 


In how many ways can 4 out of 18 girls sit on a four-seat sofa when tie oldest girl must be given one of the seats? 


Answer 


4x !7P, = 16320 ways eeeeee 


WORKED EXAMPLE 5.10 


In how many ways can four boys and three girls stand in a row when no two girls are 
allowed to stand next to each other? 


Answer 


4P, ways to arrange 4 boys in a row 


— 


eeeeeed © 


BM B, 2 


B, 
++ ey St 1 


A 
Arrange the girls in 3 of these 5 spaces 5P} 


Objects that must 
be separated are 


individually placed 
between or beyond 
the objects that can be 
separated. 


4 P, x ÎR =1440 ways 


EXERCISE 5E 


1 Find how many permutations there are of: 


a five from seven distinct objects b four from nine distinct objects. 
2 From12 books, how many ways are there to select and arrange exactly half of them in a row on a shelf? 


3 In how many ways can gold, silver and bronze medals be awarded for first, second and third places in a race 
between 20 athletes? You may assume that no two athictes tie in these positions. 


4 a Find the number of ways in which Alvaro can paint his back door and his front door in a different colour 
if he has 14 colours of paint to choose from. 


b In how many ways could Alvaro do this if he also considered painting them the same colour? 


5 Find how many of the arrangements of four letters from A, B, C, D, E and F: 


a begin with the letter A b contain the letter A. 


6 From a group of 10 boys and seven girls, two are to be chosen to act as the hero and the villain in the school 
play. Find in how many ways this can be done if these two roles are to be played by: 


a any of the chiidren b two girls or two boys c aboyanda girl. 
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7 From a set of 10 rings, a jeweller wishes to display seven of them in their 
shop window. The formation of the display is shown in the diagram opposite. 


Find the number of possible displays if, from the set of 10: 


a the ring with the largest diamond must go at the top of the display 


b the most expensive ring must go at the top with the two least expensive 
rings adjacent to it. 


§ Using each digit not more than once, how many even four-digit numbers can be made from the digits 
1, 2, 3, 4,5, 6 and 7? 


9 Find how many three-digit numbers can be made from the digits 0, 1, 2, 3 and 4, used at most once each, if the 
three-digit number: 


a must bea multiple of 10 b cannot begin with zero. 
10 Give an example of a practical situation where the calculation ” P. = 120 might arise. 
Q 11 a Under what condition is "P.>"P,_,? 
b Given that” P, x ”P,-, = k x "P, find an expression for k in terms of n andr. 


12 Five playing cards are randomly selected from a standard deck of 52 cards. These five cards are shuffled, 
and then the top three cards are placed in a row on a table. How many different arrangements of three of the 
52 cards are possible? 


(Ps) 13 Seven chairs, A to G, are arranged as shown. (>) 


In how many ways can the chairs be occupied by 7 of a group of 12 people if 
three particular people are asked to sit on chairs B, D and F, in any order? C | 


134 


gO 


Ps) 14 A minibus has 11 passenger seats. There are six seats in a row on the sunny Q 
side and five seats in a row on the shady side, as shown in the following 
diagram. 


Find how many ways eight passengers can be arranged in these seats if: 
a there are no restrictions 


b one particular passenger refuses to sit on the sunny side shady D B |) 2) 


c two particular passengers refuse to sit in seats that are either sunny > 
next to each other or one directly in front of the other. 


OLN: 


The rule to determine the number of permutations of n objects 
was known in Indian culture at least as early as 1150 and 1s 
explained in the Lildvati by Indian mathematician Bhaskara II. 


In his books Campanalogia and Tintinnalogia, Englishman 
Fabian Stedman in 1677 described factorials when explaining 
the number of permutations of the ringing of church bells. 


A complete peal of changes of n belis is made when they are rung 
in n! sequences without repetition 


The speed at which church bells ring cannot be changed very much by the ringers and this may be why 
there are at most five belis in most churches. For example, 10 bells can be rung in 3628800 different 
sequences and it would take the ringers over 3 months to ring a complete peal of changes of 10 bells! 
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5.3 Combinations 

A combination is simply a selection, where the order of selection is not important. 
Choosing strawberries and ice cream from a menu is the same combination as choosing ice 
cream and strawberries. 


When we select r objects in no particular order from n objects, we call this is a combination. KEY POINT 5.5 


A combination of r objects which are then arranged in order is equivalent to a 


permutation. We write ”C, to mean the number of combinations of r objects from n. Since Meg ar 
there are "P, = r! ways of arranging the r objects, we have: "C, = Goa 
"CX P= "P. combinations of r 
n! objects from n distinct 
"C, x r! = — 
rXP. (n—n)! objects. 
n! 
1C = i 
” r\(n—r)! 


Suppose we wish to select three children from a group of five. We can view 
this task as ‘choosing three and ignoring two’ or as ‘choosing to ignore two 
and remaining with three’. Regardless of how we view it, choosing three from 
five and choosing two from five can be done in an equal number of ways, 


and so °C; = $0. We will use this more 


modern notation in 
The following three points should be noted. Chapter 7. However, 
nC, = "Cp; si calculators use the 
nc eap C; notation, so we will 
A Wed use this in the current 


n! No. we select from! 


= chapter. 
r\(n—r)! No. selected! x No. not selected! 


KEN = 


WORKED EXAMPLE 5.11 


In how many ways can three fish be selected from a bowl containing seven fish and two potatoes? 


Answer 


35 ways eeeeeece co opii 


In how many ways can five books and three magazines be selected from eight books 
and six magazines? 


| Answer 


he eh ao 
°C; = 20 ways 


The books and the 
magazines are selected 


independently, so we 
multiply the numbers 
of combinations. 


°C. Soe) 
= 1120 ways 
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WORKED EXAMPLE 5.13 


8 SESS ————————————— =. 


A team of five is to be chosen from six women and five men. Find the number of 
possible teams in which there will be more women than men. 


You will learn 


Answer about probability 
z The iable shows the possible distributions for the 
— -wake-up of the team when it number of objects 
3 2 6C, x 5C, = 200 has more women than men that can be selected 
: 7 in it; and also the number of T a sean as 
ka ki } Ca R CS ways in which those teams 1E MUMPET 01 Women 
x i selected for this team. 
or 5 0 6C; x Co =6 can be chosen. 


200 + 75 + 6 = 281 teams with more women than men. 


WORKED EXAMPLE 5.140 


How many distinct three-digit numbers can be made from five cards, each with one 
of the digits 5, 5, 7, 8 and 9 written on it? 


Answer 


The 5 is a repeated digit, so we must investigate three situations separately. 
136 


No 5s selected: *P;=6 three-digit ~ The digits 7,8 and 9 are selected and 
numbers. atranged. 
l] 
One 5 selected: *C) x 3!=18 s Two digits from 7,8 and 9 are 
three-digit numbers. selected and arranged with a 5. 
The selections in these 
Two 5s selected: 3C, x = sĝ > o o o o One digit from 7,8 and 9 is selected three situations are 
three-digit numbers. ^” and arranged with two 5s. mutually exclusive, so 


we add together the 
numbers of three-digit 
numbers. 


6+ 18 +9 = 33 three-digit numbers can 
be made. 


EXERCISE 5 q 


1 Find the number of ways in which five apples can be selected from: 


a eight apples b nine apples and 12 oranges. 


2 From seven men and eight women, find how many ways there are to select: 


a four men and five women b three men and six women c atleast 13 people. 


3 a How many different hands of five cards can be dealt from a standard deck of 52 playing cards? 


b How many of the hands in part a consist of three of the 26 red cards and two of the 26 black cards? 


4 a From the 26 letters of the English alphabet, find how many ways there are to choose: 
i six different letters ii 20 different letters. 


b Use your results from part a to find the condition under which *C,, = *C,, where x is a positive integer. 
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In a classroom there are four lights, each operated by a switch that has an on and an off position. How many 
possible lighting arrangements are there in the classroom? 


From six boys and seven girls, find how many ways there are io select a group of three children that consists 
of more girls than boys. 

A bag contains six red fuses, five blue fuses and four yellow fuses. Find how many ways there are to select: 

a three fuses of different colours b three fuses of the same colour 

c 10 fuses in exactly two colours d nine fuses in exactly two colours. 

The diagram opposite shows the activities offered to 
children at a school camp. 


If children must choose three activities to fill their 
day, how many sets of three activities are there to 
choose from? 


Two taxis are hired to take a group of eight friends to the airport. One taxi can carry five passengers and the 
other can carry three passengers. 


What information is given in this situation by the fact that °C; = $C; = 56? 
Ten cars are to be parked in a car park that has 20 parking spaces set out in two rows of 10. Find how many 
different patterns of unoccupied parking spaces are possible if: 


a the cars can be parked in any of the 20 spaces 


b the cars are parked in the same row 

c the same number of cars are parked in each row 

d two more cars are parked in one row than in the other. 

A boy has eight pairs of trousers, seven shirts and six jackets. In how many ways can he dress in trousers, shirt 
and jacket if he refuses to wear a particular pair of red trousers with a particular red shirt? 

A girl has 11 objects to arrange on a shelf but there is room for only seven of them. 

In how many ways can she arrange seven of the objects in a row along the shelf, if her clock must be 


included? 


A Mathematics teacher has 10 different posters to pin up in their classroom but there is enough space for only 
five of them. They have three posters on algebra, two on calculus and five on trigonometry. In how many ways 
can they choose the five posters to pin up if: 


a there are no restrictions 
b they decide not to pin up either ot the calculus posters 
c they decide to pin up at least one poster on each of the three topics algebra, calculus and 


trigonometry? 


As discussed at the beginning of this chapter in Explore 5.1 about encrypting letters, it states that there are 
over 27 million possibilities for the password encrypted as UJSNOL. How many possibilities are there? 


How many distinct three-digit numbers can be made from 1, 2, 2, 3, 4 and 5, using each at most once? 
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PS) 16 From three sets of twins and four unrelated girls, find how many selections of five people can be made if 
exactly: 


a two sets of twins must be included b one set of twins must be included. 


Two women and three men can sit on a five-seater bicycle in 5! = 120 different ways. The 
photo shows an arrangement in which the two women are separated and the three men 
are also separated. 


Consider, separately, the arrangements in which the women, and in which the men, are 
all separated from each other. 


< | a Women separated from each other. | b Men separated from each other. 
| Women next to each other= 2! | Men next to each other= 3! 
i Arrange three men with the women as a Arrange two women with the men as 
single object= 4! a single object= 3! 
There are 2! x 4! arrangements in which There are 3! x 3! arrangements in | 
the women are not separated. which the men are not separated 
So there are 5!— (2! x 4!)= 72 So there are 5! — (3! x 3!) = 84 
arrangements in which the women are arrangements in which the men are 


separated from each other. separated from each other. 


The calculations in a and b follow the same steps; however, the logic in one of them is 
flawed. Which of the two answers is correct? Can you explain why the other answer is 
not correct? 


5.4 Problem solving with permutations and combinations 
Permutations and combinations can be used to find probabilities for certain events. 


if an event consists of a number of favourable permutations that are equiprobable, or a © 
number of favourable combinations that are equiprobable, then ee 


No. favourable permutations From the introduction 
P(event ) = ———— i 
No. possible permutations to this chapter, we 
No. favourable combinations know that the order 
P(event) = of selection matters 


No. possible combinations 


in a permutation but 
does not matter in a 
combination. 


Using either of the previous given methods can greatly reduce the amount of working 
required to solve probiems in probability. Nevertheless, we must decide carefully which of 
them, if any, it is appropriate to use. 
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WORKED EXAMPLE 5.15 


—_ 


There are 15 identical tins on a shelf. None of the tins are labelled but it is known that 
eight contain soup (S), four contain beans (B) and three contain peas (P). 


If seven tins are randomly selected without replacement, find the probability that 
exactly five of them contain soup. 


Answer 


Favourable selections are when five tins of soup and two tins that are not soup are 
| selected. (It is not important whether these two tins contain beans or peas.) 
We denote the 15 tins by 8S and 7S’, where S’ represents not soup. 


*Csx7Cy favourable combinations «+ «« Selecting 5S from 8S and 2S’ from 7S”. 
SC, possible combinations -+-+ ++- » [Selecting seven from 15 tins. m] 


SLR, 7 
P(select 5tins of soup) = p3 A © 
7 
56x 21 In the numerator we 
z 6435 A and 
392 +L=/. 
= — 0.183. 
TT tala 


WORK D EXAMPLE 5.16 


A girl has a bag containing 13 red cherries (R) and seven black cherries (B). She takes 
five cherries from the bag at random. Find the probability that she takes more red 
cherries than black cherries. 


Answer 


ae 


Total = 12298 


From Chapter 4, 


Section 4.2, recall that 

20 — eeeeeeeeeeeeeee eo eee z 
jas =P(4)+ P(B) + P(C) 
| P(more red than black) = 15504 —— or 0.793. for mutually exclusive 


events. 
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EXPLORE 5.4 


We studied conditional 
probabilities in 
Chapter 4, Section 4.4. 


We can, of course, find the solution to Worked example 5.16 using conditional 
probabilities. 


e There is one way to select 5R and OB. 
@ There are five ways to select 4R and 1B. 
@ There are 10 ways to select 3R and 2B. 


Complete the calculations using conditional probabilities. 


Note how much working is involved and how long the calculations take. 


Compare the two approaches to solving this problem and decide for yourself which 
you prefer. 


WORKED EXAMPLE 5.17 


eee eee ae 


A minibus has seats tor the driver (D) and seven passengers, as shown. C | g g E] 


When seven passengers are seated in random order, find the probability that two 


particular passengers, A and B, are sitting on: a C | a 5 


a the same side of the minibus 


b opposite sides of the minibus. 


140 
Answer 
i a °P, ways A and B both sitting on the driver’s side. 
4P, ways coeccccccccs A and B both not sitting on the driver’s side. 
7P, ways sssssssssssssssss] A and B sitting in any two of the seven seats. 


P(same side)= P(both ondriver’s side) + P(both not on driver’s side) 


h e 4 P, 
q TP, 7 P, 
_6 1 

-u Ys 

3 

7 

b P(oppositesides)= 1— P(same side) _...... The events ‘sitting on the same side’ and 
| 3 ‘sitting on opposite sides’ are complementary. 
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Two children are selected at random from a group of six boys and four girls. Use combinations to find the 
probability of selecting: 


a two boys b two girls c one boy and one girl. 
Three chocolates are selected at random from a box containing 10 milk chocolates and 15 dark chocolates. 
Find the probability of selecting exactly: 


a two dark chocolates b two milk chocolates c two dark chocolates or two milk chocolates. 


Four bananas are randomly selected from a crate of 17 yellow and 23 green bananas. Find the probability that: 


a no green bananas are selected b less than half of those selected are green. 


A curator has 36 paintings and 44 sculptures from which they will randomly select eight items to display in 
their gallery. Find the probability that the display consists of at least three more paintings than sculptures. 


Five people are randomly selected from a group of 67 women and 33 men. Find the probability that the 
selection consists of an odd number of women. 


In a toolbox there are 25 screwdrivers, 16 drill bits, 38 spanners and 11 chisels. Find the probability that a 
random selection of four tools contains no chisels. 


Five clowns each have a red wig and a blue wig, which they are all equally likely to wear at any particular 
time. Find the probability that, at any particular time: 


a exactly two clowns are wearing red wigs b more clowns are wearing blue wigs than red wigs. 


A gardener has nine rose bushes to plant: three have red flowers and six have yellow flowers. If they plant 
them in a row in random order, find the probability that: 


a a yellow rose bush is in the middle of the row 
b the three red rose bushes are not separated 


c no two red rose bushes are next to each other. 


A farmer has 50 animals. They have 24 sheep, of which three are male, and they have 26 cattle, of which 
20 are femaie. A veterinary surgeon wishes to test six randomly selected animals. Find the probability that 
the selection consists of: 


a equal numbers of cattle and sheep b more females than males. 


a How many distinct arrangements of the letters in the word STATISTICS are there? 
b Find the probability that a randomly selected arrangement begins with: 


i three Ts ii three identical letters. 


Three skirts, four blouses and two jackets are hung in random order on a clothes rail. Find the probability that: 
a the three skirts occupy the middle section of the arrangement 

b the two jackets are not separated. 

In a group of 180 people, there are 88 males, nine of whom are left-handed, and there are 85 females who are 


not left-handed. If six people are selected randomly from the group, find the probability that exactly four of 
them are ieit-handed or female. 
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13 A small library holds 1240 books: 312 of the 478 novels (N) have hard 1240 
covers (H), and there are 440 books that do not have hard covers. 
Some of this information is shown in the Venn diagram opposite. 


a Find the value ofa, of b and ofc. 


b A random selection of 25 of these books is to be donated to a ane 


charity group. The charity group hopes that ai least 22 of the 
books will be novels or hard covers. Calculate the probability 
that the charity group gets what they hope for. 


14 A netball team of seven players is to be selected at random from five men and 10 women. Given that at least 
five women are selected for the team, find the probability that exactly two men are selected. 

15 Two items are selected at random from a box that contains some tags and some labels. 
Selecting two tags is five times as likely as selecting two labels. 
Selecting one tag and one label is six times as likely as selecting two labeis. 


Find the number of tags and the number of labels in the box. 


Q 16 A photograph is to be taken of a pasta dish and n pizzas You will find a 
; ; eek range of interesting 
The items are arranged in a line in random order. and challenging 
Event X is ‘the pasta dish is between two pizzas’. probability problems 
(with hints and 
a Investigate the value of P(X) for values of n from 2 to 5. solutions) in Module 


X’). fn. C istif 16 on the NRICH 
PY) in terms of n. Can you justify your answer website. 


b Hence, express the value of 
for any value of n > 2? 


. 
Q 
nd understanding 
è n!=n(n-l)\(n-2)...x3x2xl1, for any integer n>0. 
0!=1 
A key word that points to a permutation is arranged. 
A permutation is a way of selecting and arranging objects ir a particular order. 
Key words that point to a combination are chosen and selected. 
A combination is a way of selecting objects in no particular order. 


From 7 distinct objects, there are: 


” P, =n! permutations of all n objects. 


n! ; ae 
wp permutations of r objects. 


. of each type. 


@ 
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The word MARMALADE contains four vowels and five consonants. Find the number of possible 
arrangements of its nine letters if: 


a there are no restrictions on the order 
b the arrangement must begin with the four voweis. 


Five men, four children and two women are asked to stand in a queue at the post office. Find how many 
ways they can do this if: 


a the women must be separated 
b all of the children must be separated from each other. 


Find the probability that a randomly selected arrangement of all the letters in the word PALLETTE 
begins and ends with the same letter. 


Eight-digit mobile phone numbers issued by the Lemon Network all begin with 79. 
a How many different phone numbers can the network issue? 
b Find the probability that a randomly selected number issued by this network: 

i ends with the digits 97 

ii reads the same left to right as right to left. 


There are 12 books on a shelf. Five books are 15cm tall; four are 20cm tall and three are 25cm tall. 
Find the number of ways that the books can be arranged on the shelf so that none of them is shorter 
than the book directly to its right. 


The 11 letters of the word REMEMBRANCE are arranged in a line. 
i Find the number of different arrangements if there are no restrictions. 
ii Find the number of different arrangements which start and finish with the letter M. 


iii Find the number of different arrangements which do not have all 4 vowels (E, E, A, E) 
next to each other. 


4 letters from the letters of the word REMEMBRANCE are chosen. 


iv Find the number of different selections which contain no Ms and no Rs and at least 2 Es. 


[1] 
[2] 


[2] 
[3] 


[3] 


[1] 


[2] 
[2] 


[2] 


[1] 
[2] 


3] 


[3] 
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Find how many ways 15 children can be divided into three groups of five if: 
a there are no restrictions 
b two of the children are brothers who must be in the same group. 


An entertainer has been asked to give a performance consisting of four items. They know three songs, 
five jokes, two juggling tricks and can play one tune on the mandolin. Find how many different ways 
there are for them to choose the four items if: 


a there are no restrictions on their performance 
b they decide not to sing any songs 
c they are not allowed to tell more than two jokes. 


From a group of nine people, five are to be chosen at random to serve on a committee. In how many 
ways can this be done if two particular people refuse to serve on the committee together? 
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[2] 
[3] 


[1] 
[2] 
[3] 


3] 
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10 Twenty teams have entered a tournament. In order to reduce the number of teams to eight, they are put 
into groups of five and the teams in each group play each other twice. The top two teams in each group 
progress to the next round. From this point on, teams are paired up, playing each other once with the 
losing team being eliminated. How many games are played during the whole tournament? [3] 


11 A bank provides each account holder with a nine-digit 


; . ; A os | a | © 3 
card number that is arranged in three blocks, as shown in 


the example opposite. 


Find, in index form, the number of card numbers available if: 


a there are no restrictions on the digits used [1] 
b none of the three blocks can begin with 0 [2] 
c the two digits in the second block must not be the same [2] 
d the three-, two- and four-digit numbers on the card are even, odd and even, respectively. [3] 


12 A basket holds nine flowers: two are pink, three are yellow and four are red. Four of these flowers are 
chosen at random. Find the probability that at least two of them are red. [4] 


13 Find the number of ways in which 11 different pieces of fruit can be shared between three boys so that 
each boy receives an odd number of pieces of fruit. [5] 


(Ps) 14 A bakery wishes to display seven of its 14 types of cake in a row in its shop window. There are six types of 
sponge cake, five types of cheesecake and three types of fruitcake. Find the number of possible displays that 


144 ; 
can be made if the bakery places: 


a asponge cake at each end of the row and includes no fruitcakes in the display [2] 


b a fruitcake at one end of the row with sponge cakes and cheesecakes placed alternately in the 
remainder of the row. [4] 


(Ps) 15 Five cards, each marked with a different single-digit number from 3 to 7, are randomly placed in a row. 
Find the probability that the first card in the row is odd and that the three cards in the middle of the 
row have a sum of 15. [4] 


(Ps) 16 Two ordinary fair dice are rolled and the two faces on which they come to rest are hidden by holding the 
dice together, as shown, and lifted off the table. 


The sum of the numbers on the 10 visible faces of the dice is denoted by T. 
a Find the number of possible values of T, and find the most likely value of 7. [4] 
b Calculate the probability that T < 38. [3] 


(Ps) 17 Three ordinary fair dice are rolled. Find the number of ways in which the number rolled with the first 
die can exceed the sum of the numbers rolled with the second and third dice. Hence, find the probability 
that this event does not occur in two successive rolls of the three dice. [6] 
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Chapter 5: Permutations and combinations 


(Ps) 18 How many even four-digit numbers can be made from the digits 0, 2, 3,4,5 and 7, each used at most 


once, when the first digit cannot be zero? [4] 
(=) 19 a i Find how many numbers there are between 100 and 999 in which all three digits are different. [3] 
ii Find how many of the numbers in part i are odd numbers greater than 700. [4] 


b A bunch of flowers consists of a mixiure of roses, tulips and daffodils. Tom orders a bunch 
of 7 flowers from a shop to give to a friend. There must be at least 2 of each type of flower. 
The shop has 6 roses, 5 tulips and 4 daffodils, all different from each other. Find the number 
of different bunches of flowers that are possible. [4] 
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(©) 20 Three identical cans of cola, 2 identical cans of green tea and 2 identical cans of orange juice are arranged 
in a row. Calculate the number of arrangements if 


i thefirst and last cans in the row are the same type of drink, [3] 


ii the 3 cans of cola are all next to each other and the 2 cans of green tea are not next to each other. [5] 


Cambridge Internationa! AS & A Level Mathematics 9709 Paper 63 Q4 June 2010 
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CROSS-TOPIC REVIEW EXE RISE 2 
si à — 


146 


1 


N 


“M 


Each of the eight players in a chess team plays 12 games against opponents from other teams. The total number of 
wins, draws and losses for the whole team are denoted by X, Y and Z, respectively. 


a State the value of X +Y +Z. [1] 
b Find the least possible value of Z- X, given that Y = 25. [1] 


c Given that none of the players drew any of their games and that X — Z = 50, find the exact mean number 
of games won by the players. [2] 


Six books are randomly given to two girls so that each receives at least one book. 
a In how many ways can this be done? [3] 
b Are both girls more likely to receive an odd number or an even number of books? Give a reason for your answer. [2] 


The 60 members of a ballroom dance society wish to participate in a competition but the coach that has been 
hired has seats for only 57 people. In how many ways can 57 members be selected if the society’s president and 
vice president must be included? [2] 


Four discs in two colours and in four sizes are placed in any order on either of two sticks. The following 
illustration shows one possible arrangement of the four discs. 


= 


a Find the number of ways in which the four discs can be arranged so that: 


i they are all on the same stick [2] 
ii there are two discs on each stick. [2] 
b In how many ways can the discs be placed if there are no restrictions? [2] 


A fair triangular spinner with sides numbered 1, 2 and 3 is spun three times aiid the numbers that it comes to rest 
on are written down from left to right to form a three-digit number. 


a How many possibie three-digit numbers are there? [1] 
b Find the probability that the three-digit number is: 
i even [1] 
ii odd and greater than 200. [2] 


A book of poetry contains seven poems, three of which are illustrated. In how many different orders can all the 
poems be read if no two illustrated poems are read one after the other? [3] 


Find the number of ways that seven goats and four sheep can sleep in a row if: 
a all the goats must sleep next to each other [2] 
b no two sheep may sleep next to each other. [3] 


A teacher is looking for 6 pupils to appear in the school play and has decided to select them at random from a 
group of 11 girls and 13 boys. 


a Find the number of ways in which the teacher can select the 6 pupiis. [1] 
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Cross-topic review exercise 2 


b Two roles in the play must be played by girls; three roles must be played by boys, but the fool can be played by a girl 
or by a boy. if the first pupil selected is to play the role of the fool, find the probability that the fool is played by 


i a particular girl [1] 
ii a boy. [1] 


c If instead, the pupil who is to play the foo! is the last of the six pupils selected, investigate what effect this 
change in the order of selection has on the probability that the fool is played by: 


i a particular girl [3] 
ii a boy. [3] 


A radio presenter has enough time at the end of their show to play five songs. She has 13 songs by four groups to 
choose from: five songs by The Anvils, four by The Braziers, three by The Chisels and one by The Dustbins. Find 
the number of ways she can choose five songs to play if she decides: 


a that there should be no restrictions [1] 
b to play all three songs by The Chisels [2] 
c to play at least one song by each of the four groups. [4] 


Students enrolling at an A Level college must select three different subjects to study from the six that are available. 
One subject must be chosen from each of the opticn groups A, B and C, as shown in the following table. 


Physics Biology Mathematics 


Chemistry Physics Biology 


History Mathematics Computing 


a One student has chosen to study History and Mathematics. How many subjects do they have to choose from 
to complete their selection? [1] 


b How many combinations of three subjects are available to a student who enrols at this college? [2] 


Four ordinary fair dice are arranged in a row. Find the number of ways in which this can be done if the four 
numbers showing on top of the dice: 


a areallodd [1] 
b have a sum that is less than 7. [3] 


At company V, 12.5% of the employees have a university degree. At company W, 85% of the employees do not 
have a university degree. There are 112 employees at company V and 120 employees at company W. 


a One employee is randomly selected. Find the probability that they: 
i work for company V [1] 
ii have a university degree. [2] 


b Five employees from company W are selected at random. Find the probability that none of them has a 
university degree. [2] 


One hundred qualified drivers are selected at random. Out of these 100 drivers, of the 40 drivers who wear 
spectacles, 30 passed their driving test at the first attempt. Altogether, 25 of the drivers did not pass at their 
first attempt. 


a Show the data given about the drivers in a clearly labelled table or diagram. [3] 


b Did these drivers pass the test at their first attempt independently of whether or not they wear spectacles? 
Explain your answer. [3] 
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14 


15 


17 


A conference ha!i has 24 overhead lights. Pairs of lights are operated by switches next to the main entrance, 
and each switch has three numbered settings: 0 (off), 1 (dim), 2 (bright). Find the number of possible lighting 
arrangements in the hall if: 


a there are no restrictions [1] 
b two particular pairs of lights must be on setting 2 [1] 


c three lights that are not operated by the same switch and five pairs of lights that are operated by the same 
switch are not working. [2] 


Twelve chairs in two colours are arranged, as shown. 
A B (©; D 
BD DDD 
D DDE 
v 
D DRH 


Find in how many ways nine people can sit on these chairs if: 


a the two blue chairs in column C must remain unoccupied [2] 
b all of the green chairs must be occupied [3] 
c more blue chairs than green chairs must be occupied [2] 
d at least one of the chairs in row 2 must remain unoccupied. [5] 


In a certain country, vehicle registration plates consist of seven characters: a letter, followed by a three-digit 
number, followed by three letters. 


For example: |B 474 PQR 


The first letter cannot be a vowel; the three-digit number cannot begin with 0; and the first of the last three letters 
cannot be a vowel or any of the letters X, Y or Z. 


a Find the number of registration plates available. [2] 


b Find the probability that a randomly selected registration plate is unassigned, given that there are 48.6 million 
vehicle owners in the country, and that each owns, on average, 1.183 registered vehicles. [3] 


Seats for the guests at an awards ceremony are arranged in two rows of eight and ten, divided by an aisle, as 
shown 


VEU VUCO 
VEU VIVUUE 


Seats are randomly allocated to 18 guests. 


aisle 


a Find the probability that two particular guests are allocated seats: 


i onthe same side of the aisle [3] 
ii in the same row [3] 
iii on the same side of the aisle and in the same row. [4] 


b Give a reason why the answer to part a iii is not equal to the product of the answers to part ai and part aii. [1] 
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Probability distributions Ry 


In this chapter you will iearn how to: 
identify and use a discrete random variable 
construct a provability distribution table that relates to a given situation involving a discrete 
random variable, X, and calculate its expectation, E(X), and its variance, Var(X). 
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PREREQUISITE neesi GES 


Where it comes from | What you should be able to do Check your skills 


IGCSE / O Level 


: Use the fact that P(A) =1-P(4’). 
Mathematics 


find P(D). 


1 A game can be won (W), lost (L) or drawn (D). 
Given that P(W’)= 0.46 and P(L’) = 0.65, 


Chapters 4 and 5, 
sections 4.3, 4.4, 4.5 


Distinguish between independent 
and dependent events, 

and calculate probabilities 
accordingly. 


Tools of the trade 

Suppose a trading company is planning a new marketing campaign. The campaign will 
probably go ahead only if the most likely outcome is that sales will increase. However, the 
company also needs to be aware of worst-case and best-case outcomes, as sales may 
decrease or decrease dramatically, stay the same or increase dramatically. The company 
will be able to make informed decisions based on its estimates of the probabilities of these 
possible outcomes. The likelihood of these outcomes will be based on an analysis of a 
probability distribution for the changes in sales. 


The probability distribution described above acts as a prediction for future sales and the 
risks involved. Suppose the company is considering entering a new line of business but 
needs to generate at least $50000 in revenue before it starts to make a profit. If their 
probability distribution tells them that there is a 40% chance that revenues will be less than 
$50000, then the company knows roughly what level of risk it is facing by entering that new 
line of business. 


6.1 Discrete random variables 


A variable is said to be discrete and random if it can take only certain values that occur by 
chance. 


For example, when we buy a carton of six eggs, some may be broken; the number of broken 
eggs in a carton is a discrete random variable that can take values 0, 1, 2, 3, 4, 5 or 6. 


Discrete random variables may arise from independent trials. For example, if we roll four 
dice then the number of 6s obtained, S, is a discrete random variable with S e {0, 1, 2, 3, 4}. 


Situations where selections are made without replacement, can also generate discrete 
random variables. For example, if we randomly select three children from a group of four 
boys and two girls, the number of boys selected, B, and the number of girls selected, G, are 
discrete random variables with B € {1, 2, 3} and G e {0, 1, 2}. 


6.2 Probability distributions 

The probability distribution of a discrete random variable is a display of all its possible 
values and their corresponding probabilities. The usual method of display is by tabulation 
in a probability distribution table. The probability distribution also can be represented in a 
vertical line graph or in a bar chart. 
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Two cubes are selected at random from a bag 
of three red cubes and three blue cubes. 


Show that the selected cubes are more likely 
to both be red when the selections are made 
with replacement than when the selections 
are made without replacement. 


A variable is denoted by 
an upper-case letter and 
its possible values by the 
same lower-case letter. 
If X can take values of 
1, 2 and 3, we write 

Xe {1, 2,3}, where the 
symbol € means ‘is an 
element of”. 


We learnt how to find 
probabilities for 
selections with and 
without replacement in 
Chapters 4 and 5, 
Sections 4.3, 4.4, 4.5 
and 5.4. 


Chapter 6: Probability distributions 


Consider tossing two fair coins, where we can obtain 0,1 or 2 heads. 

The number of heads obtained in each trial, X, is a discrete random variable and X e {0, 1, 2}. 
P(X= 0) = P(tails and tails) = 0.5 x 0.5 = 0.25 
P(X= 1) = P(heads and tails) + P(tails and heads) = (0.5 x 0.5)+ (0.5 x 0.5) = 0.5 


P(X = 2) = P(heads and heads) = 0.5 x 0.5 = 0.25 
The probability distribution for X is displayed in the foiiowing table. 


| 9 ~ 0 1 2 
"i 
0.25 0.5 0.25 


The probabilities for the possible values of X are equal to the relative frequencies of the 
values. We would expect 25% of the tosses to produce zero heads; 50% to produce one head 
and 25% to produce two heads. 


WORKED EXAMPLE 6. N 


A fair square spinner with sides labelled 1, 2, 3 and 4 is spun twice. The two scores 
obtained are added together to give the total, X. Draw up the probability distribution 
table for X. 


Answer 


nd spin 


5 


P(X = x) is equal to 
the relative frequency 
of each particular 

value of X. 


Sum = J 


Note that EP(X = x)=1. 


ORKED EXAMPLE 6.2 


The following table shows the probability distribution for the random variable V. 


5 


2c +0.05 


Find the value of the constant c and find P(V > 4). 
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Answer 


0.05+¢7 +¢+0.14+2¢+0.05+0.16=1 
ce +3c—0.64=0 
(c—0.2)(c+ 3.2) =0 


c=0.2 or c=-3.2 


The valid solution is c = 0.2. 


“PWV >4)= PV =5)+ PV =6) 


| =(2x0.2)+0.05+0.16 
| =0.61 


WORKED EXAMPLE 6.3 


+ 


There are spaces for three more passengers on a bus, but eight youths, one man and 


one woman wish to board. 


X 


N 
gjethematics: Probability & Statistics 1 


We use Zp=1 io form and solve 
an equation in c. 


Note that if c =—3.2, then 
P(Y =3)=10.24, PV = 4) = -3.1 
and P(V = 5) = —6.35. 


The bus driver decides to select three of these people at random and allow them 


to board. 


Draw up the probability distribution table for Y, the number of youths selected. 


Answer 


Selections are made without replacement, so we can use combinations to find P(Y = y). 


10 C, possible selections. 


l] 
| 
Possible values of Y are 1, 2 and 3 
| 
| 


NG 15 
G x’C 7 
P(Y =2)= Tc, = 
GXC 7 
P(Y =3)= WG, e 
3 
es 
15 
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At least one youth will be selected 
because there are only two non-youthis, 
who we denote by Y”. 


Selecting three from 10 people 


Selecting one from 8Y. and two 
from 2Y’. 


Selecting two from 8Y, and one 
from 2Y’. 


Selecting three from 8Y, and none 
from 2Y”. 


‘Lhe table shows the probability 
distribution for Y. 


A probability 
distribution shows all 
the possible values of a 
variable and the sum 
of the probabilities is 
2p 


Do check whether the 
solutions are valid. 
Remember that a 
probability cannot be 
less than 0 or greater 
than 1. 


Always check that 
Yp=l. 


EXERCISE 6A 


1 


Chapter 6: Probability distributions 


The discrete random variable V is such that V e€ {1, 2, 3}. Given that PV =1) = P(V =2) =2x P(V =3), draw 
up the probability distribution table for V. 


The probability distribution for the random variable X is given in the following table. 


2 3 4 5 


a 


Pp 2p z7? 3p 


Find the value of p and work out P(2< X <5). 


The probability distribution for the random variable W is given in the foliowing table. 


3 6 9 12 15 
k 4 13 

2 ae ao 2 

2k k 2 5 ah 50 


a Form an equation using k, then solve it. 
b Explain why only one of your solutions is valid. 
c Find P(6<W<10). 


The probability that a boy succeeds with each basketball shot is a He takes two shots and the discrete 
random variable S represents the number of successful shots. 


Show that P(S = 0) = = and draw up the probability distribution table for S. 

At a garden centre, there is a display of roses: 25 are red, 20 are white, 15 are pink and 5 are orange. 

Three roses are chosen at random. 

a Show that the probability of selecting three red roses is approximately 0.0527. 

b Draw up the probability distribution table for the number of red roses selected. 

c Find the probability that at least one red rose is selected. 

Three vehicles from a company’s six trucks, five vans, three cars and one motorbike are randomly selected 
and tested for roadworthiness. 

a Show that the probability of selecting three vans is Z 

b Draw up the probability distribution table for the number of vans selected. 

c Find the probability that, at most, one van is selected. 

Five grapes are randomly selected without replacement from a bag containing one red grape and six green 
grapes. 

Name and list the possible values of two discrete random variables in this situation. 


State the relationship between the values of your two variables. 
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10 


11 


12 


13 


14 


15 


A pack of five DV Ds contains three movies and two documentaries. Three DV Ds are selected and the 
following table shows the probability distribution for M, the number of movies selected. 


2 3 


0.6 0.1 


Draw up the probability distribution table for D, the number of documentaries selected. 


In a particular country, 90% of the population is right-handed and 40% of the population has red hair. 
Two people are randomly selected from the population. Draw up the probability distribution for X, the 
number of right-handed, red-haired people selected, and state what assumption must be made in order 
to do this. 


A fair 4-sided die, numbered !, 2, 3 and 5, is rolled twice. The random variable Y is the sum of the two 
numbers on which the die comes to rest. 

l 
a Show that P(X =8)=—. 


8 
b Draw up the probability distribution table for X, and find PLY > 6). 


There are cight letters in a post box, and five of them are addressed to Mr Nut. Mr Nut removes four letters at 
random from the box. 
a Find the probability that none of the selected letters are addressed to Mr Nut. 


b Draw up the probability distribution table for N, the number of selected letters that are addressed 
to Mr Nut. 


c Describe one significant feature of a vertical line graph or bar chart that could be used to represent the 


probability distribution for N. 


A discrete random variable Y is such that Y e {8, 9, 10}. Given that P(Y = y) = kv, find the value of the 
constant k. 

Q is a discrete random variable and Q e {3, 4, 5, 6}. 

a Given that P(Q =q) =cgq’, find the value of the constant c. 

b Hence, find P(Q>4). 

Four books are randomly selected from a box containing i0 novels, 10 reference books and 5 dictionaries. 
The random variable N represents the number of novels selected. 

a Find the value of P(N =2), correct to 3 significant figures. 

b Without further calculation, state which of N =0 or N =4 is more likely. Explain the reasons for your 


answer. 


In a game, a fair 4-sided spinner with edges labelled 0, 1, 2 and 3 is spun. If a player spins 1, 2 or 3, then that is 
their score. If a player scores 0, then they spin a fair triangular spinner with edges labelled 0, 1 and 2, and the 
number they spin is their score. Let the variable X represent a player’s score 

a Show that P(X =0)= P 

b Draw up the probability distribution table for X, and find the probability that X is a prime number. 
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16 A biased coin is tossed three times. The probability distribution for H, the number of heads obtained, is 
shown in the following table. 


0.512 0.384 0.096 a 


a Find the probability of obtaining a head each time the coin is tossed. 
b Give another discrete random variable that is related to these trials, and calculate the probability that its 


value is greater than the value of H. 


17 Two ordinary fair dice are rolled. A score of 3 points is awarded if exactly one die shows an odd number and 
there is also a difference of | between the two numbers obtained. A player who rolls two even numbers is 
awarded a score of 2 points, otherwise a player scores | point. 


a Draw up the probability distribution table for S, the number of points awarded. 
b Find the probability that a player scores 3 points, given that the sum of the numbers on their two dice is 
greater thau 9. 
18 The discrete random variable R is such that Re {1, 3,5, 7}. 
a Given that P(R=r)= ar find the value of the constant k. 
b Hence, find P(R < 4). 


i EXPLORE 6.1 


Consider the probability distribution for X, the number of heads obtained when two 
fair coins are tossed, which was given in the table presented in the introduction of 
Section 6.2. Sketch or simply describe the shape of a bar chart (or vertical line graph) 
that can be used to represent this distribution. This can be done 
manually or using the 
Coin Flip Simulation 
on the GeoGebra 
website. 


In this activity, you will investigate how the shape of the distribution of X is altered 
when two unfair coins are tossed; that is, when the probability of obtaining heads 
is p#0.5. 


Consider the case in which p = 0.4 for both coins. Draw a bar chart to represent the 
probability distribution of X, the number of heads obtained. 


Next consider the case in which p = 0.6 for both coins, and draw a bar chart to 
represent the probability distribution of X. We will learn how to 


What do you notice about the bar charts for p = 0.4 and p = 0.6? extend this Explore 
| activity to more than 


Investigate other pairs of probability distributions for which the values of p add two coins in Chapter 7. 
up tol, such as p=0.3 and p=0.7. Make general comments to summarise your We will see how to 
results. represent the probabil- 


: s ; ity distribution for a 
Investigate how the value of P(X = 1) changes as p increases from 0 to 1, and then iaou andom 


represent this graphically. On the same diagram, show how the values of P(X = 0) variable in Chapter 8 
and P(X =2) change as p increases from 0 to 1. Section 8.1. 
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i) DID YOU KNOW? K 


The simple conjecture of Fermat’s Last Theorem, which is that 
x” + y” =z" has no positive integer solutions for any integer 
n > 2, defeated the greatest mathematicians for 350 years. 


The theorem is simple in that it says ‘a square can be divided 
into two squares, but a cube cannot be divided into two cubes, 
nor a fourth power into two fourth powers, and so on’. Pierre 
de Fermat himself claimed to have a proof but only wrote in 
his notebook that ‘this margin is too narrow to contain it’! 


Fermat’s correspondence with the French 
mathematician, physicist, inventor and philosopher Blaise 
Pascal helped to develop a very important concept in basic 
probability that was revolutionary at the time; 
namely, the idea of equally likely outcomes and 
expected values. 


Since his death in 1665, substantia! prizes have been 
offered for a proof, which was finaily delivered by 
Briton Andrew Wiles in 1995. 


Wiles’ proof used highly advanced 

20th century mathematics (i.e. functions of 
complex numbers in hyperbolic space and the 
doughnut-shaped solutions of elliptic curves!) that 
was not available to Fermat. Andrew Wiles 


156 


6.3 Expectation and variance of a discrete random variable 

Values of a discrete random variable with high probabilities are expected to occur more 
frequently than values with low probabilities. When a number of trials are carried out, a 
frequency distribution of values is produced, and this distribution has a mean or expected value. 


Expectation 
The mean of a discrete random variable X is referred to as its expectation, and is Q 
written E(X). 


We can think of E(X) 
as being the long-term 
average value of X 
P(X = x) over a large number of 
0 1 2 3 trials. 


> oi | o& | o@ | o2 | 
(N 


Suppose we have a biased spinner with which we can score 0, 1, 2 or 3. The probabilities for 
these scores, X, are as given in the following table and are also represented in the graph. 


However many times it is spun, we expect 
to score 0 with 10% of the spins; 1 with 
30%; 2 with 40% and 3 with 20%. 


The expected frequencies of the scores in ———» 
1600 trials are shown in the following table. 0 1 2 i 


0.1 x 1600 =160 0.3 x 1600 = 480 0.4 x 1600 = 640 0.2 x 1600 = 320 
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From this table of expected frequencies, we can calculate the mean (expected) score in 1600 


trials. 


Exf _ (0x 160) + (1x 480) + (2 x 640) + (3.x 320) _ 


Mean = E(X)= 
Ss 1600 


1.7 


We obtain the same value for E(X) if relative frequencies (i.e. probabilities) are used 
instead of frequencies. 


xxp _ (0X0.1)+ (1x 0.3)+ (2x 0.4)4+ (3 x 0.2) _ 


Mean = E(X) = 
ea (X) Sp I 


1.7 


EXPLORE 6.2 


Adam and Priya each have a bag of five cards, numbered 1, 2, 3,4 and 5. They 
simultaneously select a card at random from their bag and place it face-up on a table. 
The numerical difference between the numbers on their cards, X, is recorded, where 
X €{0,1, 2, 3, 4}. They repeat this 200 times and use their results to draw up a 
probability distribution table for X. 


Adam suggests a new experiment in which the procedure will be the same, except 
that each of them can choose the card that they place on the table. He says the 
probability distribution for X will be very different because the cards are not selected 
at random. Priya disagrees, saying that it will be very similar, or may even be exactly 
the same. 


Do you agree with Adam or Priya? Explain your reasoning. 
LSY EE 


Variance 

The variance and standard deviation of a discrete random variable give a measure of the 
spread of values around the mean, E(X). These measures, like E(X), can be calcuiated 
using probabilities in place of frequencies. 


If we replace f by p and replace x by E(X) in the second of the two formulae for variance, 


2 2 
— 7 — 72, we obtain = P _ sE(X)}, which simplifies to £x? p- {E(¥)}? because Zp = 1. 
p 


O) KEY POINT .3 


| The variance of a discrete random variable is Var(X) = £x’? p - {E(X)?. 


WORKED EXAMPLE 6.4 


The following table shows the probability distribution for X. Find its expectation, 
variance and standard deviation. 


0 5 15 20 
L 3 5 3 
12 12 12 12 
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Q 


Q REWIND 


The expectation of a 
discrete random 
variable is E(X) = Exp 


The denominator is 
Xp = 1, so we can omit 


it from our calculation 
of E(X). 


An alternative way to 
write the formula for 
expectation is 

E(X)= I|x xP(X = x)]. 


We can remember 
variance from Chapter 
3, Section 3.3 as ‘mean 


of the squares minus 
square of the mean’. 


Ww 


~ 
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Answer 
e= oxi }+( 5x5 ii] 20% } Substitute 
12) LX 12 12 12 

l values of X 

=—X[(0x1)+(5x3)+(15x5)+(20x3)] and P(X =x) The working is simpler 
12 : . 

l into the when all fractions have 
= a x150 formula the same denominator. 
=12.5 E(X) = =xp. 

 Var(X)= G x i) Gi x 2) (19 x 5 N 20 x 3)- {12.5}°. Substitute 
| 12 12 IWA 12 
i into the 
= —x (0x1) + (25% 3) + (225% 5)+(400x3)]|-156.25 formula for Remember tosubtract 
{2 Var(X) the square of E(X) 
2400 å : when calculating 
= —— — 156.25 , 
12 variance. 
=43.75 
Take tie 
SD(X) = V43.75 = 6.61, correct to 3 significant figures. square root of 


the variance. 


EXERCISE 6B 


i The probability distribution for the random variable X is given in the following table. 


158 


0 1 24 3 


0.10 0.12 0.36 0.42 


Calculate E(X) and Var(X). 


2 The probability distribution for the random variable Y is given in the following table. 


0 1 2 3 4 
0.03 2p 0.32 P 0.05 l 


a Find the value of p. 


b Calculate E(Y) and the standard deviation of Y. 


3 The random variable T is such that T e (1, 3, 6, 10}. Given that the four possible values of T are equiprobable, 
find E(T) and Var(T). 


4 The following table shows the probability distribution for the random variable V. 


1 3 9 m 


0.4 0.28 0.14 0.18 


Given that E(V ) = 5.38, find the value of m and calculate Var ). 


5 R isa random variable such that R e {10, 20, 70, 100}. Given that P(R = r) is proportional to r, show that 
E(R)=77 and find Var(R). 
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6 The probability distribution for the random variable W is given in the following table. 


J 
K 2 7 a 24 


vX 0.3 0.3 0.1 0.3 


Given that E(W) = a, find a and evaluate Var(W). 


© 7 The possible outcomes from a business venture are graded from 5 to 1, as shown in the following table. 


5 4 3 2 1 
High profit | Fair profit No loss Small loss | Heavy loss 
0.24 0.33 0.24 0.11 0.08 


a Calculate the expected grade and use it to describe the expected outcome of the venture. Find the standard 
deviation and explain what it gives a measure of in this case. 


b Investigate the expected outcome and the standard deviation when the grading is reversed (i.e. high profit 
is graded i, and so on). Compare these outcomes with those from part a. 
8 Two ordinary fair dice are rolled. The discrete random variable X is the lowest common multiple of the two 
numbers rolled. 
a Draw up the probability distribution table for X. 
b Find E(X) and P[X > E(X)]. 
c Calculate Var(X). 


9 Ina game, a player attempts to hii a target by throwing three darts. With each throw, a player has a 30% 
chance of hitting the target. 


a Draw up the probability distribution table for H, the number of times the target is hit in a game. 


b How many times is the target expected to be hit in 1000 games? 


10 Two students are randomly selected from a class of 12 girls and 18 boys. 
a Find the expected number of girls and the expected number of boys. 


b Write the ratio of the expected number of girls to the expected number of boys in simplified form. What do 
you notice about this ratio? 


c Calculate the variance of the number of giris selected. 

11 A sewing basket contains eight reels of cotton: four are green, three are red and one is yellow. Three reels of 
cotton are randomly selected from the basket. 
a Show that the expected number of yellow cotton reels is 0.375. 
b Find the expected number of red cotton reels. 
c Hence, state the expected number of green cotton reels. 

12 A company offers a $1000 cash loan to anyone earning a monthly salary of at least $2000. To secure the loan, 
the borrower signs a contract with a promise to repay the $1000 plus a fixed fee before 3 months have elapsed. 


Failure to do this gives the company a legal right to take $1546 from the borrower’s next salary before return- 
ing any amount that has been repaid. 


From past experience, the company predicts that 70% of borrowers succeed in repaying the loan plus the fixed 
fee before 3 months have elapsed. 
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a Calculate the fixed fee that ensures the company an expected 40% profit from each $1000 loan. 
b Assuming that the company charges the fee found in part a, how would it be possible, without changing 


the loan conditions, for the company’s expected profit from each $1000 loan to be greater than 40%? 


13 When a scout group of 8 juniors and 12 seniors meets on a Monday evening, one scout is randomly 
selected to hoist a flag. Let the variable X represent the number of juniors selected over n consecutive 
Monday evenings. 


a By drawing up the probability distribution table for X, or otherwise, show that E(X )=1.2 when n=3. 
b Find the number of Monday evenings over which 14 juniors are expected to be selected to hoist 


the flag. 


14 An ordinary fair die is rolled. If the die shows an odd number then S, the score awarded, is equal to that 
number. If the die shows an even number, then the die is rolled again. If on the second roll it shows an odd 
number, then that is the score awarded. If the die shows an even number on the second roll, the score awarded 
is equal to half of that even number. 


a List the possible values of S and draw up a probability distribution table. 
b Find P[S > £(S)]. 


c Calculate the exact value of Var(S). 


(Ps) 15 A fair 4-sided spinner with sides labelled A, B, B, B is spun four times. 
a Show that there are six equally likely ways to obtain exactly two Bs with the four spins. 
160 b By drawing up the probability distribution table for X, the number of times the spinner comes to rest 
f Var(X) 


E(X) 
c What, in the context of this question, does the value found in part b represent? 


on B, find the value o 


EXPLORE 6.3 (>>) FAST FORWARD 


We will study the 
expectation of two 
special discrete random 
A ball is dropped into the top of the variables, and the 
device shown in the diagram. 


In this activity we will investigate a series of trials in which each can result in one of 
two possible outcomes. 


variance of one of 


them, in Chapter 7, 
(J When a ball hits a nail (which is shown as Sections 7.1 and 7.2. 
a red dot), there are two equally likely 
outcomes: it can fall to the left or it can 
fall to the right. 


i Using L and R (to indicate left and right), 
list all the ways that a ball can fall into 
each of the cups A, B and C. 


| Use your lists to tabulate the probabilities 
| of a ball falling into each of the cups. Give 
LI LI L all probabilities with denominator 4. 
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The diagram shows a similar device with 
four cups labelled A to D. 


List all the ways that a ball can fall into 
each of the cups. 


Use your lists to tabulate the probabilities 
of a ball fatiing into each of the cups. Give 
all probabilities with denominator 8. 


LI 


A B C D We will study 
11 Wy independent trials that 
Can you explain how and why the values (+) and (+) are connected with the have only two possible 
2 2 outcomes, such as left 


probabilities in your tab!es? 
ee . ae or right and success or 
The next device in the sequence has 10 nails on four rows. Tabulate the probabilities failure, in Chapter 7 


of a ball falling into each of its five cups, A to E. Sections 7.1 and 7.2. 


¢ 


list of learning and und 


e A discrete random variable can take only certain values and those values occur in a certain 
random manner. 


e A probability distribution for a discrete random variable is a display of all its possible values 
and their corresponding probabilities. 


@ For the discrete random variable X: 
Zp 
E(X) = =xp 
Var(X) = =x? p—{E(X)? 


X 
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Find the mean and the variance of the discrete random variabile ¥, whose probability distribution is given in the 
following tabie. [3] 


1 2 3 4 


l-k 2-3k 3-4k 4-6k 


The following table shows the probability distribution for the random variable Y. 


1 10 q 101 

0.2 0.4 0.2 0.2 
a Given that Var(Y) = 1385.2, show that q? — 61g + 624 = 0 and solve this equation. [4] 
b Find the greatest possible value of E(Y ). [2] 


An investment company has produced the following table, which shows the probabilities of various percentage 
profits on money invested over a period of 3 years. 


1 5 10 15 20 30 40 45 50 


Q 0.05 0.10 0.50 0.20 0.05 0.04 0.03 0.02 0.01 


1 


a Calculate the expected profit on an invesiment of $50000. [3] 


b A woman considers investing $50000 with the company, but decides that her money is likely to earn more 
when invested over the same period 1n a savings account that pays r% compound interest per annum. 


Calculate, correct to 2 decimai places, the least possible value of r. [3] 


A chef wishes to decorate each of four cupcakes with one randomly selected sweet. They choose the sweets at 
random from eight toffees, three chocolates and one jelly. Find the variance of the number of cupcakes that will 
be decorated with a chocoiate sweet. [6] 


The faces of a biased die are numbered 1, 2, 3, 4, 5 and 6. The random variable X is the score when the die is 
thrown. The probability distribution table for X is given. 


The die is thrown 3 times. Find the probability that the score is at least 4 on at least 1 of the 3 throws. [5] 
Cambridge international AS & A Level Mathematics 9709 Paper 61 Q2 June 2016 [Adapted] 


A picnic basket contains five jars: one of marmalade, two of peanut butter and two of jam. A boy removes one 
jar at random from the basket and then his sister takes two jars, both selected at random. 


a Find the probability that the sister selects her jars from a basket that contains: 
i exactly one jar of jam [1] 
ii exactly two jars of jam. [1] 


b Draw up the probability distribution table for J, the number ot jars of jam selected by the sister, and show 
that E(J) = 0.8. [4] 
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7 Two ordinary fair dice are rolled. The product and the sum of the two numbers obtained are calculated. 
The score awarded, S, is equal to the absolute (i.e. non-negative) difference between the product and 
the sum. 


For example, if 5 and 3 are rolled, then S = (5x 3)—(5+3)=7. 
a State the value of S when | and 4 are rolled. [1] 


b Draw up a table showing the probability distribution for the 14 possible values of S, and use it 
to calculate E(S). [5] 


8 A fair triangular spinner has sides labelled 0, 1 and 2, and another fair triangular spinner has sides labelled 
—1,0 and 1. The score, X, is equal to the sum of the squares of the two numbers on which the spinners 
come to rest. 


a List the five possible values of X. fl 

b Draw up the probability distribution table for X. [3] 

c Given thai X <4, find the probability that a score of 1 is obtained with at least one of the spinners. [2] 

d Find the exact value of a, such that the standard deviation of X is - x E(X). [3] 
9 A discrete random variable X, where X e {2, 3, 4, 5}, is such that PLY = x)= or 

a Calculate the two possible values of b. [3] 

b Hence, find P(2 < X <5). [2] 


© 10 Set A consists of the ten digits 0, 0, 0, 0, 0, 0, 2, 2, 2, 4. 
Set B consists of the seven digits 9, 0, 0, 0, 2, 2, 2. 


One digit is chosen at random from each set. The random variable X is de‘ined as the sum of these 


two digits. 

i Show that P(X =2)= - i2] 
ii Tabulate the probability distribution of X. [2] 
iii Find E(X ) and Var(X). [3] 
iv Given that X = 2, find the probability that the digit chosen from set A was 2. [2] 


Cambridge International AS & A Level Mathematics 9709 Paper 63 Q5 June 2010 


11 The discrete random variable Y is such that Y e {4, 5, 8, 14,17} and P(Y = y) is directly proportional to A 
y 


+1 
Find P(Y > 4). 14 
PS) 12 YX isa discrete random variable and X € {0, 1, 2, 3}. Given that PLY > 1) = 0.24, P(0 < X <3)=0.5 and 
P(X =0 or 2)= 0.62, find PLY <2|X > 0). [5] 


Ps) 13 Four students are to be selected at random from a group that consists of seven boys and x girls. The variables B 
and G are, respectively, the number of boys selected and the number of girls selected. 


a Given that P(B = 1) = P(B =2), find the value of x. [3] 
b Given that G = 3, find the probability that G = 4. [3] 
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© 


14 A box contains 2 green apples and 2 red apples. Apples are taken from the box, one at a time, without 


15 


replacement. When both red apples have been taken, the process stops. The random variable X is the number of 
apples which have been taken when the process stops. 


i Show that P(X =3)= 7 [3] 
ii Draw up the probability distribution table for X. [3] 


Another box contains 2 yellow peppers and 5 orange peppers. Three peppers are taken from the box 
without replacement. 


iii Given that at least 2 of the peppers taken from the box are orange, find the probability that all 3 peppers 
are orange. [5] 


Cambridge International AS & A Level Mathematics 9709 Paper 63 Q7 November 2014 


In a particular discrete probability distribution the random variable X takes the value - with probability 


r a < è 
—, where r takes all integer values from 1 to 9 inclusive. 


45 

i Show that PUY = 40) =<. [2] 
ii Construct the probability distribution table for X. [3] 
iii Which is the modal value of X? [1] 
iv Find the probability that X lies between 18 and 100. [2] 


Cambridge International AS & A Level Mathematics 9709 Paper 62 Q5 November 2009 
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In this chapter you will iearn how to: 


m use formulae for probabilities for the binomial and geometric distributions, and recognise 
practical situations in which these distributions are suitable modeis 

m use formulae for the expectation and variance of the binomial distribution and for the expectation 
of the geometric distribution. 


-ARZ ) 
AY 
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PREREQUISITE KNOWLEDGES 
O 


Where it comes from What you should be able to do Check your skills 


Chapter 4 Calculate expectation in a 1 Two ordinary fair dice are rolled 378 
fixed number of repeated 


independent trials, given the 
probability that a particular 
event occurs. 


times. How many times can we expect 
the sum of the two numbers rolled to 
be greater than 8? 


q IGCSE / O Level Mathematics | Expand products of algebraic Given that (a+b)? =a? +3a?b + 
expressions. 3ab* + b?, find the four fractions in the 


Pure Mathematics 1 Use the expansion of (a +b)”, 


-(1. 3 
` : Taaa expansion of | —+> | and confirm 
where n is a positive integer. 4 4 


that thcir sum is equal to 1. 


Two special discrete distributions 

Seen in very simple terms, all experiments have just two possible outcomes: success or 
failure. A business investment can make a profit or a loss; the defendant in a court case is 
found innocent or guilty; and a batter in a cricket match is either out or not! 


In most real-life situations, however, there are many possibilities between success and 
failure, but taking this yes/no view of the outcomes does allow us to describe certain 
situations using a mathematical model. 


Tivo such situations concern discrete random variables that arise as a result of repeated 
independent trials, where the probability of success in each trial is constant. 


e A binomial distribution can be used to model the number of successes in a fixed number 
of independent trials. 


e A geometric distribution can be used to model the number of trials up to and including 
the first success in an infinite number of independent trials. 


7.1 The binomial distribution 


Consider an experiment in which we roll four ordinary fair dice. 
In each independent trial, we can obtain zero, one, two, three or four 6s. 
Let the variable R be the number of 6s rolled, then R e {0, 1, 2, 3, 4}. 


To find the probability distribution for R, we must calculate P(R =r) for all of its 
possible vaiues. 


Using 6 to represent a success and X to represeni a iailure in each trial, we have: 


P(success) = P(6) = > and P(failure) = P(X) = >. 


Calculations to find P(R =r) are shown in the following table. 
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(XXXX) 4c, @ 
1 3 
1 | (XXX), (X6XX), (XX6X), (XXX6) 4C,=4 (3) (2) 
1 2 5 2 
2 | (66XX), (6X6X), (6XX6), (X66X), (X6X6), (XX66) 10, =6 (3) (2) 
1 3 5 1 
3 | (666X), (66X6), (6X66), (X666) 4C,=4 (3) (2) 
4 E r 1 ie 0 
4 | (6666) G=1 c(2} >| 


In the table, we see that P(R =r) = taf 
J 
n 
Using the | ) notation, the five probabilities shown in the previous table are given by 
E 


AR soln 


A discrete random variable that meets the following criteria is said to have a binomial 
distribution and it is defined by its two parameters, n and p. 


6 


D 
ST 


binomial expansion of (G A 


1) (5 


@ There are n repeated independent trials. 

@ nis finite. 

@ There are just two possible outcomes for each trial (i.e. success or failure). 
@ The probability of success in each trial, p, is constant. 


The random variable is the number of trials that result in a success. 


A discrete random variable, X, that has a binomial distribution is denoted by X ~ B(v, p). 


Q 


n 
If X ~B(n, p) then the probability of r successes is p, = [ ) pi- p). 
p 


(A 


Q TIP 


n 
Values of ( are the coefficients of the terms in a binomial expansion, and give the number of 
r 


ways of obtaining r successes in n trials. 


p(l- p)" * is the probability for each way of obtaining r successes and (n—r) failures. 


For example, if the variabie X ~ B(3, p), then X e {0, 1, 2, 3}, and we have the following 
probabilities. 
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Each way of obtaining 
a particular number 
of 6s has the same 
probability. 


For work involving 
binomial expansions, 
the notation ”C, is 
rarely used nowadays. 
Your calculator may 
use this notation but 
it has mostly been 


n 
replaced by | } 
r 


We met a series of 
independent events 
with just two possible 
outcomes in the 
Explore 6.3 activity in 
Chapter 6, Section 6.3. 


Coefficients for power 
3 are 1,3,3 and 1. 


ww 
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3 l 3 3 AT 
P(X =0)= 0 xp xq =i P(X =1)= i x p' xq? =3pg 
3 h 3 i 
P(X =2)= 5 x p xq! = 3 pq! P(X =3)= xp xq =1p* 
The coefficients in all binomial expansions are symmetric strings of integers. When <4) SEWING 
arranged in rows, they form what has come to be known as Pascal’s triangle (named after 
the French thinker Blaise Pascal). Part of this arrangement is shown in the following We saw in Chapter 5, 
diagram, which includes the coefficient for power 0 for completeness. Section 5.3 that 


n! 


i r\(n-r)l 


168 A regular pentagonal spinner is shown. Find the probability that 10 spins 
produce exactly three As. 


Answer 
P(X =3)= 3] x 0.47 x 0.67 Let the random variable X be the 
3 number of As obtained. We have 10 
ee! nt _ independent trials with a constant 
31x 7! f probability of a success, P(A) = 0.4. 
= 0.215 to 3 signiñcant figures. So X ~ B(10, 0.4), and we require 
three successes and seven failures. 


Remember that X 
represents the number 
of successes, so it can 


Given that X ~ B(8, 0.7), find P(X > 6), correct to 3 significant figures. 


| Answer take integer values 
PUK SEIS PLA = VERLA =e} X ~B(8, 0.7) tells us that poe aoe 
8 8 , = = = 
-( Jor x03! +[ } 0.7" x0.3" E ee ean 
8, and that 
= 0.197650... + 0.057648... KEWLI e hE Premete coundiie 


of probabilities in 
the working may 

2 a Fe lead to an incorrect 
final answer. Here, 
0.198 + 0.0576 = 0.256. 
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WORKED EXAMPLE 7.3 


—_ ee 


In a particular country, 85% of the population has rhesus-positive (R+) blood. 


Find the probability that fewer than 39 people in a random sample of 40 have 
rhesus-positive blood. 


Answer 
P(X < 39) = 1-[P(X = 39) + P(X = 40)] 


40 40 

=1- x0.85% x 0.15! + ) 0.85% x 0.159 
39 40} 

= 1—[0.010604...+ 0.001502...] 


= 0.988 


Recall from Chapter 4, 
Section 4.1 that 
P(A) = 1- P(4’). 


EXPLORE7.1 ¢ 


Binomiai distributions can be investigated using the Binomial Distribution resource 
on the GeoGebra website. 


We could, for example, check our answer to Worked example 7.3 as follows. 


Click on the distribution tab and select binomial from the pop-up menu at the bottom- 
left. Select the parameters n=40 and p=0.85, and a bar chart representing the 
probability distribution will be generated. 


To find P(X < 39), enter into the boxes P((0| < X < |38|) and, by tapping the chart, 
the value for this probability is displayed. (At the right-hand side you will see a list of 
the probabilities for the 41 possible values of X in this distribution.) 


WORKED EXAM pi DI 
CN 


Given that X ~B(n, 0.4) and that P(X = 0) < 0.1, find the least possible value of n. 


Recall from IGCSE / O 
Answer Level that if x >a, 
then —x < —a. We must 


| n é reverse the inequality 
P(X=0)= 0 x 0.4" x 0.6" = 0.6 cootowooce sign when we multiply 
or divide by a negative 
number, such as 
So we need 0.6” <0.1 logio 0.6. 
log 0.6” <log0.1 
nlog0.6<log0.1 eeeeeeeeeeeeeee 
log0.1 
log0.6 
n>4.50... 
n=5 is the least possible value of n. 


n takes integer values 
only. 
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1.0.67 = 0.216; 0.6% = QN296; 0.6° = 0.07776. 


Alternatively, we can solve 
0.6" < 0.1 by trial and 


The least possible value is n=5. -== improveraent. We know 


EXERCISE 7A 


1 


10 


11 


that n is an integer, so we 
evaluate 0.6!, 0.62, 0.6°, ... 
up io the first one whose 


value is less than 0.1. 


The variable X has a binomial distribution with n=4 and p=0.2. Find: 
a P(X¥ =4) b P(XY =0) c P(¥ =3) d P(X =3 or 4). 


Given that Y ~B(7, 0.6), find: 
a P(Y=7) b P(Y =5) c P(Y #4) d P(3<Y<6). 


Given that W ~B(9, 0.32), find: 


a PW =5) b P(W #5) c P(W<2) d P(O<W<9). 
Given that y~B(8 >) find: 
a PV=4) b PV=7) c PV S2) d P3B5V<6) e P(V isan odd number). 


Find the probability that each of the following events occur. 
a Exactly five heads are obtained when a fair coin is tossed nine times. 
b Exactly two 6s are obtained with 11 rolls of a fair die. 


A man has five packets and each contains three brown sugar cubes and one white sugar cube. He randomly 
selects one cube from each packet. Find the probability that he selects exactly one brown sugar cube. 


A driving test is passed by 70% of people at their first attempt. Find the probability that exactly five out of 
eight randomly selected people pass at their first attempt. 


Research shows that the owners of 63% of all satoon cars are male. Find the probability that exactly 20 out of 
30 randomly selected saloon cars are owned by: 

a males b females. 

In a particular country, 58% of the adult population is married. Find the probability that exactly 12 out of 20 
randomly selected adults are married. 

A footballer has a 95% chance of scoring each penalty kick that she takes. Find the probability that she: 

a scores from all of her next 10 penalty kicks 

b fails to score from exactly one of her next seven penalty kicks. 


On average, 13% of all tomato seeds of a particular variety faii to germinate within 10 days of planting. Find the 
probabiiity that 34 or 35 out of 40 randomly selected seeds succeed in germinating within 10 days of planting. 
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There is a 15% chance of rain on any particular day during the next 14 days. Find the probability that, during 
the next 14 days, it rains on: 


a exactly? days b at most 2 days. 

A factory makes electronic circuit boards and, on average, 0.3% of them have a minor fault. Find the 
probability that a random sample of 200 circuit boards contains: 

a exactly one with a minor fault b fewer than two with a minor fault. 

There is a 50% chance that a six-year-old child drops an ice cream that they are eating. Ice creams are given to 
5 six-year-old children. 

a Find the probability that exactly one ice cream is dropped. 

b 45 six-year-old children are divided into nine groups of five and each child is given an ice cream. Calculate 


the probability that exactly one of the children in at most one of the groups drops their ice cream. 


A coin is biased such that heads is three times as likely as tails on each toss. The coin is tossed 12 times. 
The variables H and T are, respectively, the number of heads and the number of tails obtained. Find the 
PUT =7) 

PREET 


Given that Q~B(n, 0.3) and that P(Q = 0)>0.1, find the greatest possible value of n. 


value of 


The variable T~B(n, 0.96) and it is given that F(T = n)>0.5. Find the greatest possible value of n. 
Given that R~B(n, 0.8) and that P(R>n-1i)<0.006, find the least possible value of n. 


The number of damaged eggs, D, in cartons of six eggs have been recorded by an inspector at a packing 
depot. The following table shows the frequency distribution of some of the numbers of damaged eggs in 
150000 boxes. 


0 1 2 3 4 5 6 
141393 | 8396 a b 0 0 0 


The distribution of D is to be modelled by D~B(6, p). 

a Estimate a suitable value for p, correct to 4 decimal places. 

b Calculate estimates for the value of a and of b. 

c Caiculate an estimate for the least number of additional cartons that would need to be inspected for there 


to be at least 8400 cartons containing one damaged egg. 


The number of months during the 4-month monsoon season (June to September) in which the total rainfall was 
greater than 5metres, R, has been recorded at a location in Meghalaya for the past 32 years, and is shown in the 
following table. 


0 1 2 3 4 
2 8 12 8 2 


The distribution of R is to be modelled by R~B(4, p). 


a Find the value of p, and state clearly what this value represents. 


b Give a reason why, in real life, it is unlikely that a binomial distribution could be used to model these 
data accurately. 
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21 [Ina particular country, 90% of both females and males drink tea. Of those who drink tea, 40% of the females 
and 60% of the males drink it with sugar. Find the probability thai iùn a random selection of two females and 
two males: 


a all four people drink tea 
b an equal number of females and males drink tea with sugar. 
Ps) 22 It is estimated that 0.5% of all left-handed people and 0.4% of all right-handed people suffer from some form 


of colour-blindness. A random sample of 200 left-handed and 300 right-handed people is taken. Find the 
probability that there is exactly one person in the sample that suffers from colour-blindness. 


Although Pascal’s triangle is named after the 17th century 
French thinker Blaise Pascal, it was known about in China 
and in Persia as early as the 11th century. The earliest 
surviving display is of Jia Xian’s triangle in a work compiled 
in 1261 by Yang Hui, as shown in the photo. 
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A frog sits on the bottom-left square of a 5 by 5 
grid. In each of the other 24 squares there is a lily 
pad and four of these have pink flowers growing 
from them, as shown in the image. 


The frog can jump onto an adjacent lily pad but it 
can only jump northwards (N) or eastwards (E). 


The four numbers on the grid represent the 

‘ number of different routes the frog can take io 
get to those particular lily pads. For exampie, 
there are three routes to the lily pad with the number 3, and these routes are 
EEN, ENE and NEE. 


Sketch a 5 by 5 grid and write onto it the number of routes to all 24 lily pads. 
Describe any patterns that you find in the numbers on your grid. 


The numbers on the lily pads with pink flowers form a sequence. Can you continue 
this sequence and find an expression for its nth term? 
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Expectation and variance of the binomial distribution fiae) SIET 
Expectation and standard deviation give a measure of central tendency and a measure of 
variation for the binomial distribution. We can calculate these, along with the variance, We saw in Chapter 6, 


from the parameters n and p. Section 6.3 that 
expectation is a 


variable’s long-term 


Consider the variable X ~B(2, 0.6), whose probability distribution is shown in the 


following table. average value. 
~\ 5 0 1 2 
X~ 0.16 0.48 0.36 


Applying the formulae for E(X) and Var(X) gives the following results. 


E(X) = Exp = (0 x 0.16) + (1 x 0.48) + (2 x 0.36)=1.2 


2 We saw in Chapter 4 
eae 2_ (92 2 2 _122=0 48 A 
Var(X) = Ex p— {E(X )¥ =(04 x 0.16) + (1% x 0.48) + (2° x 0.36) — 1.2% = 0.4 Section Aol thát event 
Our experiment consists of n = 2 trials with a probability of success p= 0.6 in each, so we A is expected to occur 
should not be surprised to find that E(X )= np =2 x 0.6=1.2. nx P(A) times. 


What may be surprising (and a very convenient result), is that the variance of X also can be 
found from the values of the parameters n and p. 


Var(X) = np(l— p)=2x 0.6 0.4= 0.48 


KEYPOINT 7.2 
Pa N 


The mean and variance of XY ~B(n, p) are given by p= np 


Note that u= E(Y) 
and o° = Var(X). 


and o° = np(1— p)=npgq. 


WORKED EXAMPLE 7.5 


Given that X ~B(12, 0.3), find the mean, the variance and the standard deviation of X. 


Answer 
E(X)= np 
=12>«03 

Var(X )= np(1— p) 

=12x 03x07 
| S252 

SD(X) = ./np(— p) We can also write 
i our answers as 
S252 

ae U=3.6,07 =2.52 and 

= 1.59 to 3 significant figures o=1.59. 
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WORKED EXAMPLE 7.6 


The random variable X ~B(n, p). Given that E(Y)=12 and Var(X) = 7.5, find: 


a the value ofn and of p 


b P(X=1)). 
Answer 
15 og, Weil LO) 1 a 
a g=—=0.625 seeccecccccccccrccens © We use q = ++ = ——~ to find p. 
“B 1 ap © EO) i 
i p=1-q=0.375 
12 E(x) 
=——_=32 2 O)S 72, 9 2= SS 
0.375 a p 
32 : 5 
b pox =1=(/ 0375x062 soc AC 582, 0.375) 
= 0.138 


EXERCISE 7 K | 


1 Calculate the expectation, variance and standard deviation of each of the following discrete random 
174 variables. Give non-exact answers correct to 3 significant figures. 


a V~B(5,0.2) b W ~B(24, 0.55) c X ~B(365,0.18) d Y ~B(20, V0.5) 
2 Given that X ~ B(8,0.25), calculate: 

a E(X)and Var(X) b PIX=E(X) c P[X<E(X). 
3 Given that Y ~ B(11, 0.23), calculate: 

a P(Y #3) b P[Y <E(Y)}. 
4 Given that X ~ Bin, p), E(X)=20 and Var(X) = 12, find: 

a the value ofn and of p b P(X =21). 


5 Given that G ~B(n, p), E(G)=245 and Var(G)= 10-5, find: 
a the parameters of the distribution of G b P(G=20). 


6 W hasa binomial distribution, where E(W)=2.7 and Var(W ) = 0.27. Find the values of n and p and use 
them to draw up the probability distribution table for W. 


7 Give a reason why a binomial distribution would not be a suitable model for the distribution of X in each of 
the following situations. 
a X isthe height of the tallest person selected when three people are randomly chosen from a group of 10. 


b X isthe number of girls selected when two children are chosen at random from a group containing one 
girl and three boys 


c X isthe number of motorbikes selected when four vehicles are randomly picked from a car park 
containing 134 cars, 17 buses and nine bicycles. 
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8 The variable Q~ p(n i} and its standard deviation is one-third ofits mean. Calculate the non-zero value of 


n and find P(5<0 <8). 


9 The random variable H ~ B(192, p), and E(H) is 24 times the standard deviation of H. Calculate the value of 
p and find the value of k, given that P(H =2)=kx2°”. 


10 It is estimated that 1.3% of the matches produced at a factory are damaged in some way. A household box 
contains 462 matches. 
a Calculate the expected number of damaged matches in a household box. 


b Find the variance of the number of damaged matches and the variance of the number of undamaged 
matches in a household box. 


c Show that approximately 10.4% of the household boxes are expected to contain exactly eight damaged matches. 
d Calculate the probability that at least one from a sample of two household boxes contains exactly eight 
damaged matches. 

11 On average, 8% of the candidates sitting an examination are awarded a merit. Groups of 50 candidates are 
selected at random. 
a How many candidates in each group are not expected to be awarded a merit? 
b Calculate the variance of the number of merits in the groups of 50. 
c Find the probability that: 


i three, four or five candidates in a group of 50 are awarded merits 


ii three, four or five candidates in both of two groups of 50 are awarded merits. 


7.2 The geometric distribution 


Consider a situation in which we are attempting to roll a 6 with an ordinary fair die. 


How likely are we to get our first 6 on the first roll; on the second roll; on the third roll, 
and so on? 


We can answer these questions using the constant probabilities of success and failure: 
pand l- p. 


P(first 6 on first roll)= p— a success. 
P(first 6 on second roll) = (1— p)p — a failure followed by a success. 


P(first 6 on third roll)=(1— p} p > two failures followed by a success. 


The distribution of X, the number of trials up to and including the first success in a series 
of repeated independent trials, is a discrete random variable whose distribution is called a 
geometric distribution. 


The following table shows the probability that the first success occurs on the rth trial. 


1 2 3 4 oe n 


p pd-p) | pa-p? | pa-py | = | pa-p" 
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The values of P(X =r) in the previous table are the terms of a geometric progression (GP) Q 

: A ae ` REWIND 
with first term p and common ratio 1- p. The sum of the probabilities is equa! to the sum 

to infinity of the GP. 


We saw in Chapter 6, 
EPX =r)]=S.. = first term - P _ Section 6.2 that Xp=1 
“~!-commonratio 1-(l-p) ` for a probability 


ne ; Pear E distribution. Yı 
The sum of the probabilities in a geometric probability distribution is equal to 1. e E 


will also have seen 
A discrete random variable, X, is said to have a geometric distribution, and is defined by geometric progressions 


its parameter p, if it meets the following criteria. and geometric series in 


Pure Mathematics 1, 
Chapter 6. 


© The repeated trials are independent. 

@ The repeated trials can be infinite in number. 

@ There are just two possible outcomes for each trial (i.e. success or failure). 
@ The probability of success in each triai, p, is constant. 


9) E = 
CA a 
A random variable X that has 2 geometric distribution is denoted by X ~ Geo( p), and the 
probability that the first success occurs on the rth trial is 


An alternative form 
of this formula, 

P(X =r)= g xp, 
where p=1-4, 
reminds us that the 
r—1 failures occur 
before the first success. 


Ppl) wlors = 3 oo 


The binomial and geometric distributions arise in very simtiar situations. The significant 
difference is that the number of trials in a binomial distribution 1s fixed from the start 
and the number of successes are counted, whereas, in a geometric distribution, trials are 
repeated as many times as necessary until the first success occurs. 
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Recall from 
n : 
For X ~ B(n, p), there are( Jays to obtain r successes. Section 7.1 that 
r n! 
, , : ="C,=—_" _. 
For X ~ Geo( p), there is only one way to obtain the first success on the rth trial, and that d ri(n - r)! 


is when there are r — 1 failures followed by a success. 


WORKED EXAMPLE 7.7. 


Repeated independent trials are carried out in which the probability of success in each trial is 0.66. 
Correct to 3 significant figures, find the probability that the first success occurs: 

a on the third trial 

L 


b on or before the second trial 


c after the third trial. 
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Answer 
a P(X =3)=pi=py SECU eeee errr eee 
= 0.66 x 0.347 
= 0.0763 
b P(e 2) = PX =1)4 PUY =2) 
apt pil=p) 
= 0.884 


| c P(X >3)=1-P(¥<3) 
=1-[P(¥ =1)+ P(X =2)¢P WX =3)] 


=1-[p+ p(l- p)+ p- p)’] 
= 0.0393 


Probabilities that involve inequalities can be found by summation for smal! values of r, O 
as in parts b and ¢ of Worked example 7.7. However, for larger values of r, the following £ 
results will be useful. 


) KEY POINT 7.4 


P(X <r) = P(success on one of the first r trials) = 1— P(failure on the first r trials) 


P(X >r) =P (first success after the rth trial) = P(failure on the first r trials) 


These two results are written in terms of g in Key point 7.4. 


ORKED EXAMPLE 7.8 


In a particular country, 18% of adults wear contact lenses. Adults are randomly selected and interviewed one at a 
time. Find the probability that the tirst adult who wears contact lenses is: 


a one of the first 15 interviewed 


b not one of the first nine interviewed. 


Answer 
a P(X <15)=1-q) Perr rer A 
=1-0,82!5 
= 0.949 
b P(X >9)=q? 
= 0.82? 
| = 0.168 


L ë 
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A coin is biased such that the probability of obtaining heads with each toss is equal 


to 2. The coin is tossed until the first h 


coin is tossed: 
a at least six times 


b fewer than eight times. 


Answer 
i a P(X =6)=P(X>5) 


ay 


ead is obtained. Find the probability that the 


Let X represent the 
number of times 
the coin is tossed 
up to and including 
the first heads, then 


Nee Geof $ ) and 
6 li, 


iby 


1 Given the discrete random variable X ~ Geo(0.2), find: 


a P(Y =7) b P(X #5) c P(X>4). 
2 Given that T ~ Geo(0.32), find: 
a P(T =3) b P(T <6) ce P(T>7). 


‘At least six times’ has 


the same meaning as 
‘more than five times’. 


‘Fewer than eight 


times’ has the same 
meaning as ‘seven or 
fewer times’. 


3 The probability that Mike is shown a yellow card in any football match that he plays is = Find the 


probability that Mike is next shown a yellow card: 


a inthe third match that he plays 


b before the fourth match that he plays. 


4 On average, Diya concedes one penalty in every six hockey matches that she plays. Find the probability that 


Diya next concedes a penalty: 


a inthe eighth match that she plays 


b after the fourth match that she plays. 


5 The sides of a fair 5-sided spinner are marked 1, 1, 2,3 and 4. It is spun until the first score of 1 is obtained. 
Find the probability that it is spun: 


a exactly twice 
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6 Itis known that 80% of the customers at a DIY store own a discount card. Customers queuing at a checkout 
are asked if they own a discount card. 


a Find the probability that the first customer who owns a discount card is: 
i the third customer asked ii not one of the first four customers asked. 


b Given that 10% of the customers with discount cards forget to bring them to the store, find the probability 
that the first customer who owns a discount card and remembered to bring it to the store is the second 
customer asked. 


7 Ina manufacturing process, the probability that an item is faulty is 0.07. Items from those produced are 
selected at random and tested. 
a Find the probability that the frst faulty item is: 
i the 12th item tested ii not one of the first 10 items tested iii one of the first eight items tested. 
b What assumptions have you made about the occurrence of faults in the items so that you can calculate the 
probabilities in part a? 
8 Two independent random variables are X ~ Geo(0.3) and Y ~ Geo(0.7). Find: 
a P(X¥ =?) b P(Y =2) c P(X =landY =1). 
9 On average, 14% of the vehicles being driven along a stretch of road are heavy goods vehicles (HGVs). A girl 


stands on a footbridge above the road and courts the number of vehicles, up to and including the first HGV 
that passes. Find the probability that she counts: 


a at most three vehicles b atleast five vehicles. 


10 The probability that a woman can connect to her home Wi-Fi at each attempt is 0.44. Find the probability 
that she fails to connect until her fifth attempt. 


11 Decide whether or not it would be appropriate to model the distribution of X by a geometric distribution in 
the following situations. In those cases for which it is not appropriate, give a reason. 


a A bag contains two red sweets and many more green sweets. A chiid selects a sweet at random and eats it, 
selects another and eats it, and so on. X is the number of sweets selected and eaten, up to and including 
the first red sweet. 


b A monkey sits in front of a laptop with a blank word processing document on its screen. X is the number 
of keys pressed by the monkey, up to and including the first key pressed that completes a row of three 
letiers that form a meaningful three-letter word. 


c X is the number of times that a grain of rice is dropped from a height of 2 metres onto a chessboard, up to 
and including the first time that it conies to rest on a white square. 


d X is the number of races in which an athlete competes during a year, up to and including the first race that 
he wins. 


12 The random variable T has a geometric distribution and it is given that ARES = 15.625. Find P(T = 3). 


@® 13 ¥~Geo(p) and P(X =2) = 0.2464. Given that p<0.5, find P(X > 3). 


(Ps) 14 Given that X ~ Geol p) and that P(X <4)= = find PAS X <4). 
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(Ps) 15 Two ordinary fair dice are rolled simultaneously. Find the probability of obtaining: 
a the first double on the fourth roll 


b thefirst pair of numbers with a sum of more than 10 before ihe 10th roll. 


Ps) 16 X ~ Geo9.24) and Y ~ Geo(0.25) are two independent random variables. Find the probability that X +Y = 4. 


Mode of the geometric distribution 

All geometric distributions have two features in common. These are clear to see when bar 
charts or vertical line graphs are used to represent values of P(X =r) for different values 
of the parameter p. You can do this manually or using a graphing tool such as GeoGebra. 


The first common feature is that P(X =1) has the greatest probability in all geometric & 
distributions. This means that the most likely value of X is 1, so the first success is most 

likely to occur on the first trial. Secondly, the value of P(X =r) decreases asr increases. 
This is because the common ratio between the probabilities (q = 1- p) is less than !: 


p> pl- p)> pl- pY > pl- py > pU- p)* >... 


The mode of 
all geometric 
distributions is 1. 


The following table shows some probabilities for the distributions X ~ Geo(0.2) and 
X ~ Geo(0.7). In both distributions, we can see that probabilities decrease as the value of 
X increases. 


0.08192 


$ 0.7 0.21 0.063 0.0189 0.00567 


Expectation of the geometric distribution 

Recall that the expectation or mean of a discrete random variable is its long-term average, 

which is given by E(X) = u = Xxp,. If X ~ Geo(p) then 
1 


Applying this to the geometric distribution Geo(p), it turns out we find that the mean 1s D= 


equal to L, the reciprocal of p. 


(E 


; Dre TET A 1 
Using algebra, we can prove that the mean of the geometric distribution is equal to —. 
p We studied the 


For X ~ Geo(p), we have X e{1,2,3, 4,...}and p, ={p, pq, pq’, pq?,..-}- expectation of a 


discrete random 
variable in Chapter 6, 
Section 6.3. 


Step | of the proof is to form an equation that expresses u in terms of p and q. 


To do this we use u = Exp,. 


There are three more steps required to complete the proof, which you might like to 
try without any further assistance. However, some guidance is given below if needed 


Step 2: Multiply the equation obtained in step 1 throughout by q to obtain a second 
equation. 
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Step 3: Subtract one equation from the other. 


Step 4 If you have successfully managed steps 1, 2 and 3, you should need no help 
completing the proof! 


WORKED EXAMPLE 7.10 


One in four boxes of Zingo breakfast cereal contains a free toy. Let the random 
| variable X be the number of boxes that a child opens, up to and including the one in 
which they find their first toy. 


a Find the mode and the expectation of X. 


b Interpret the two values found in part a in the context of this question. 


Answer 
a The mode of X is 1. 


1_ «Ne as 
EU)=4=(+) a24 peeeeeveeecce ay An answer written ‘in 
Aik P fi È 0A context’ must refer to 
b child is most likely to find their first a specific situation; in 
toy in the first box they open but, on this case. the situation 


average, a child will find their first toy deséribedin the 
in the fourth box that they open. qüestion 


WORKED EXAMPLE 7.11 


The variable X follows a geometric distribution. Given that E(Y) = 35 find P(X > 6). 


Answer 

FU) Say 0p57 sooo Wein the parameit y and then wenndg 
p 2 / $ 

zje 

3g 

P(X >6)=¢48 
=(3 
=0, 133 

L. A 
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WORKED EXAMPLE 7.12 


Given that X ~ Geo(p) and that P(X <32, find: 
a P(X >3) 
b P(l< X <3). 
Answer 
a P(X >3)=1-P(X <3) 
\ ic 819 
~~ 4331 
_ 512 
~ 1331 
Fa 
b l-g Pa =3) We use 1-q" = P(X Sr) to 
Po ee find g and 
Teci cones 
fac 
=31 <= 
FEI 
g 3 Alternatively, 
q T and =T we can use 
P< ¥ <3) 
Rl<X = 3)=P(X =2)+P(X =3) =P(¥ =3)-P(X =1) 
182 7 2 819 3 
= pq+ pq Se ee 
1331 11 
| eo ae _ 456 


~ 1331 


| 1331 


1 Given that X ~ Geo(0.36), find the exact value of E(X). 


2 The random variable Y follows a geometric distribution. Given that P(Y =1)=0.2, find E(Y). 
3 Given that S~ Geo(p) and that E(S)= 4l, find P(S = 2). 


4 LetT be the number of times that a fair coin is tossed, up to and including the toss on which the first tail is 
obtained. Find the mode and the mean of 7. 


5 Let X be the number of times an ordinary fair die is rolled, up to and including the roll on which the first 6 is 
obtained. Find E(X) and evaluate PIY > E(Y)]. 


6 A biased 4-sided die is numbered 1, 3,5 and 7. The probability of obtaining each score is proportional to that 
score. 


a Find the expected number of times that the die will be rolled, up to and including the roll on which the 
first non-prime number is obtained. 


b Find the probability that the first prime number is obtained on the third roll of the die. 
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7 Sylvie and Thierry are members of a choir. The probabilities that they can sing a perfect high C note on each 
attempt are = and 2, respectively. 
a Who is expected to fail fewer times before singing a high C note for the first time? 
b Find the probability that both Sylvie and Thierry succeed in singing a high C note on their second 


attempts. 


(Ps) & A standard deck of 52 playing cards has an equai number of hearts, spades, clubs and diamonds. A deck is 
shuffled and a card is randomly selected. Let X be the number of cards selected, up to and including the first 
diamond. 


a Given that X follows a geometric distribution, describe the way in which the cards are selected, and give 
the reason for your answer. 


b Find the probability that: 
i YX isequalto E(X) 
ii neither of the first two cards selected is a heart and the first diamond is the third card selected. 
(Ps) 9 A study reports that a particular gene in 0.2% of all people is defective. X is the number of randomly selected 


people, up to and including the first person that has this defective gene. Given that P(X < b) > 0.865, find 
E(X) and find the smallest possible value of b. 


(Ps) 10 Anouar and Zane play a game in which they take turns at tossing a fair coin. The first person to 
toss heads is the winner. Anouar tosses the coin first, and the probability that he wins the game is 
0.5! +0.53 +0.55 +0.57 +... 


a Describe the sequence of results represented by the value 0.5% in this series. 
b Find, ina similar form, the probability that Zane wins the game. 


c Find the probability that Anouar wins the game. 


EXPLORE 7.4 


In a game for two peopie that cannot be drawn, you are the stronger piayer 
with a 60% chance of winning each game. 


The probability distributions for the number of games won by you and those 
won by your opponent when a single game is played, X and Y, are shown. 


0 1 0 1 


0.40 0.60 0.60 0.40 


Be 
Investigate the probability distributions for Y and Y in a best-of-three 
contest, where the first player to win two games wins the contest. 


Who gains the advantage as the number of games played in a contest 
increases? What evidence do you have to support your answer? 


How likely are you to win 2 best-of-five contest? 
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Given that X ~ a(n 1) find an expression for P(X = 1) in terms of n. [2] 
n 


A family has booked a long holiday in Skragness, where the probability of rain on any particular day is 0.3. 
Find ihe probability that: 


a thefirst day of rain is on the third day of their holiday [1] 
b it does not rain for the first 2 weeks of their holiday. [2] 


One plastic robot is given away free inside each packet of a certain brand of biscuits. There are four colours of 
plastic robot (red, yellow, blue and green) and each colour is equally likely to occur. Nick buys some packets of 
these biscuits. Find the probability that 


i he gets a green robot on opening his first packet, [1] 

ii he gets his first green robot on opening his fifth packet. [2] 

Nick’s friend Amos is also collecting robots. 

iii Find the probability that the first four packets Amos opens all contain different coloured robots. [3] 
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Weiq! has two fair triangular spinners. The sides of one spinner are labelled 1, 2, 3, and the sides of the other are 
labelled 2, 3, 4. Weiqi spins them simultaneously and notes the two numbers on which they come to rest. 


a Find the probability that these two numbers differ by 1. [2] 


b Weiqi spins both spinners simultaneously on 15 occasions. Find the probability that the numbers on 
which they come to rest do not differ by 1 on exactly eight or nine of the 15 occasions. [3] 


A computer generates random numbers using any of the digits 0, 1, 2, 3, 4, 5, 6, 7,8, 9. The numbers appear on 
the screen in blocks of five digits, such as [50119] {26317} [40068] ....... Find the probability that: 


a there are no7s in the first block [1] 
b the first zero appears in the first block [1] 
c the first 9 appears in the second block. [2] 


Four ordinary fair dice are rolled. 
a In how many ways can the four numbers obtained have a sum of 22? [2] 
b Find the probability that the four numbers obtained have a sum of 22. [2] 


c The four dice are rolled on eight occasions. Find the probability that the four numbers obtained have a 
sum of 22 on at least two of these occasions. [3] 


When a certain driver parks their car in the evenings, they are equally likely to remember or to forget to switch 
off the headlights. Giving your answers in their simplest index form, find the probability that on the next 
16 occasions that they park their car in the evening, they forget to switch off the headlights: 


a 14 more times than they remember to switch them off [2] 
b at least 12 more times than they remember to switch them off. [3] 


Gina has been observing students at a university. Her data indicate that 60% of the males and 70% of the 
females are wearing earphones at any given time. She decides to interview randomly selected students and to 
interview males and females alternately. 


Copyright Material - Review Only - Not for Redistribution 


X 


N 
Cambridge International AS & A Levelgfathematics: Probability & Statistics 1 


10 
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a Use Gina’s observation data to find the probability that the first person not wearing earphones is the third 
male interviewed, given that she first interviews: 


i amale [2] 
ii a female [2] 
iii a male who is wearing earphones. [2] 
b State any assumptions made about the wearing of earphones in your calculations for part a. [1] 


In Restaurant Bijoux 13% of customers rated the food as ‘poor’, 22% of customers rated the food as 
‘satisfactory’ and 65% rated it as ‘good’. A random sample of 12 customers who went for a meal at 
Restaurant Bijoux was taken. 


i Find the probability that more than 2 and fewer than 12 of them rated the food as ‘good’. [3] 
On a separate occasion, a random sample of n customers who went for a meal at the restaurant was taken. 


ii Find the smallest value of n for which the probability that at least 1 person will rate the food as ‘poor’ is 
greater than 0.95. [3] 
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A biased coin is four times as likely to land heads up compared with tails up. The coin is tossed k times so that 
the probability that it lands tails up on at least one occasion is greater than 99%. Find the least possible 
value of k. [4] 


1 Given that X ~ B(n, 0.4) and that P(X =1)=k x P(X =n-1), express the constant k in terms of n, and 


find the smallest value of n for which k > 25. [5] 


A book publisher has noted that, on average, one page in eight contains at least one spelling error, one page 
in five contains at least one punctuation error, and that these errors occur independently and at random. The 
publisher checks 480 randomly selected pages from various books for errors. 


a How many pages are expected to contain at least one of both types of error? [2] 


b Find the probability that: 


i the first spelling error occurs after the 10th page [2] 
ii the first punctuation error occurs before the 10th page [2] 
iii the 10th page is the first to contain both types of error. [2] 


Robert uses his calculator to generate 5 random integers between 1 and 9 inclusive. 
i Find the probability that at least 2 of the 5 integers are less than or equal to 4. [3] 


Robert now generates n random integers between | and 9 inclusive. The random variable X is the number of 
these n integers which are less than or equal to a certain integer k between | and 9 inclusive. It is given that 
the mean of X is 96 and the variance of X is 32. 


ii Find the values of n and k. [4] 
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Anna, Bel and Chai take turns, in that order, at rolling an ordinary fair die. The first person to roll a 6 wins 
the game. 


Find the ratio P(Anna wins): P(Bel wins): P(Chai wins), giving your answer in its simplest form. [7] 
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In this chapter you will ican how to: 


sketch normal curves to illustrate distributions or probabilities 

use a normal distribution to model a continuous random variable and use normal 
distribution tabies 

solve problems concerning a normally distributed variable 

recognise conditions under which the normal distribution can be used as an approximation to 
the binomial distribution, and use this approximation, with a continuity correction, in solving 
problems. 


X 
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Where it comes from What you should be able to do | Check your skills 


Chapter 7 Find and calculate with the 1 Given X ~B(45, 0.52), find E(X) and 
expectation and variance of a Var(X) 


binomial distribution. 


Given that X follows a binomial 
distribution with E(X) = 11.2 and 
Var(X) = 7.28, find the parameters of 
the distribution of X. 


Why are errors quite normat? 


If you study any of the sciences, you will be required at some time to measure a quantity 
as part of an experiment. Thai quantity could be a measurement of time, mass, distance, 
volume and so on. Whatever it is, any measurement you make of a continuous quantity 
such as these will be subject to error. The very nature of continuous quantities means that 
they cannot be measured precisely and, no matter how hard we try, inaccuracy is also 
likely because our tools lack perfect calibration and we, as human beings, add in a certain 
amount of unreliability. 


However, small errors are more likely than large errors and our measurements are usually 
just as likely to be underestimates as overestimates. When repeated measurements are 
taken, errors are likely to cancel each other out, so the average error is close to zero and the 
average of the measurements is virtually error-free. 


188 


This chapter serves as an introduction to the idea of a continuous random variable and the 
method used to display its probability distribution. We will later focus our attention on one 
particular type of continuous random variable, namely a normal random variable. 


The normal distribution was discovered in the late 18th century by the German 
mathematician Carl Friedrich Gauss through research into the measurement errors made 
in astronomical observations. Some key properties of the normal distribution are that 
values close to the average are most likely; the further values are from the average, the less 
likely they are to occur, and the distribution is symmetrical about the average. 


8.1 Continuous random variables 


A continuous random variable is a quantity that is liable to change and whose infinite 
number of possible values are the numerical outcomes of a random phenomenon. 
Examples include the amount of sugar in an orange, the time required to run a marathon, 
measurements of height and temperature and so on. A continuous random variable is not 
defined for specific values. Instead, it is defined over an interval of values. 


Consider the mass of an apple, denoted by X grams. Within the range of possible masses, 
X can take any value, such as 111.2233..., or 137.8642..., or 145.2897..., or .... The 
probability that X takes a particular value is necessarily equal to 0, since the number of 
values that it can take is infinite. However, there will be a countable number of vaiues in 
any chosen interval, such as 130 < X < 140, so a probability for each and every interval can 
be found. 


The probability distribution of a discrete random variable shows its specific values and 
their probabilities, as we saw in Chapters 6 and 7. 
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The probability distribution of a continuous random variable shows its range of values and 
the probabilities for intervals within that range. 


e When X isa discrete random variable, we can represent PLY = r). 
e When X is a continuous random variable, we can represent Pia = X < b). 


Before looking ai probability distributions for continuous random variables in detail, we 
will consider how we can represent the probability distribution of a set of collected or 
observed continuous data. 


Representation of a probability distribution 

A set of continuous data can be illustrated 1 a histogram, where column areas are 
proportional to frequencies. To illustrate the probability distribution of a set of data, we 
draw a graph that is based on the shape of a histogram, as we now describe. 


If we change the frequency density values on the vertical axis to relative frequency density 
values (relative frequency density = relative frequency + class width) then column areas 
will represent relative frequencies, which are estimates of probabilities. The vertical axis of 
the diagram can now be labeled ‘probability density’. 


For equal-width class intervals, the process described above has no effect on the ‘shape’ of 
the diagram. The result is that the total area of the columns changes from ‘£f’ to 1, which 
is the sum of the probabilities of all the possible values. 


So we can draw a curved graph over the columns of an equal-width interval histogram 
(preferably one displaying large amounts of data with many classes) to model the 
probability distribution of a set of continuous data. 


In the case of a random variable, such a curved graph represents a function, y = f(x), 
and is called a probability density function, abbreviated to PDF or pdf. The area under the 
graph of the PDF is also equal to 1. 


A curved graph is sketched over each of the histograms in the diagram below. 


area under curve = 1 area under curve = i 


Paf je y =f) 
mm!) N 
il h 


If you were asked to describe these two curves, you may be tempted to say that the curve 
on the right is ʻa bit odd’ and that the curve on the left is ‘a bit more normal’... and you 
would be quite right in doing so, as you wiil see shortly. 


Probability density < 
Probability density = 
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We saw how to display 
continuous data 

in a histogram in 
Chapter 1, Section 1.3. 


The word function 
should only be used 
when referring to a 
random variable. For 
data, we should rather 


use curve and/or graph. 


X 
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Three commonly occurring types of curved graph are shown in the diagrams below. 


negatively skewed symmetric positively skewed The mode is located at 
LC the graph’s peak. The 
median is at the value 
where the area under 
| the graph is divided 
into two equal parts; 


longer tail to the left even tails longer tail to the right 


this value can be found 
by calculation from the 


7 histogram or estimated 
(PLORE 8.1 F i 
from a cumulative 
i a 


frequency graph. 


Three frequency distributions are shown in the tables below. 


Use a histogram to sketch a graph representing the probability distribution for each 
of w, x and y. 


3<sw<6 | 6Sw<9 |9<w<12|l2s<sw<15|15<&w<18|18<w<21ii2i <w <24 


13 13 13 13 13 13 13 


3sx<6 | 65x<9 | 9< x<12 |12sx<15|15sx<18|18 Sx <21|21 < x <24 


3 9 18 24 18 9 3 


3<y<6 | 6<y<9 | 9<y<l2 12<y<15ļ|15=y<18 18S y<21|21< y< 24 
190 C 8 19 10 4 | 10 19 8 


Discuss and describe the shapes of the three graphs. What feature do they have in 
common? 


Compare the measures of central tendency (averages) for w, x and y. 


The normal curve 


The frequency distribution of x in the Explore 8.1 activity produces a specia! type of 
curved graph. It is a symmetric, bell-shaped curve, known as a norma! curve. 


If a probability distribution is represented by a normal curve, then: 


e Mean = median = mode 

e The peak of the curve is at the mean (u), and this is where we find the curve’s line of 
symmetry 

e Probability density decreases as we move away from the mean on both sides, so the 
further the values are from the mean, the less likely they are to occur 

e An increase in the standard deviation (o) means that values become more spread out 
from the mean. This results in the curve’s width increasing and its height decreasing, so 
that the area under the graph is kepi ai a constant value of 1. 


Graphs that represent probability distributions of related sets of data, such as the heights 
of the boys and the heights of the girls at your school, can be represented on the same 
diagram, so that comparisons can be made. 


The following diagram shows two pairs of normal curves with their means and standard 
deviations compared. Note that the areas under the graphs in each pair are equal. 
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As we can see, A and B have the same mean, but the shapes of the normal curves are 
different because they do not have the same standard deviation. Curve B is obtained from 
curve A by stretching it both vertically (from the horizontal axis) and horizontally (from 
the line of symmetry). 


X and Y have identically-shaped normal curves because they have the same standard 
deviation, but their positions or locations are different because they have different means. 
Each curve can be obtained from the other by a horizontal translation. 


EXPUORE 8.2 


You can investigate the effect of altering the mean and/or standard deviation on 


the location and shape of a normal curve by visiting the Density Curve of Normal 
Distribution resource on the GeoGebra website. 


Note that the area under the curve is always equal to 1, whatever the values of u and o. 


EXERCISE 8A 


1 The probability distributions for A and B are 
represented in the diagram. 


Indicate whether each of the following statements is 
true or false. 


Ua > Hp 


OA < Og 


Probability density 


o ov 


c Aand B have the same range of values. 
d o4 = 0% 


At least half of the values in B are greater than Ly. 


D 


f At most half of the values in A are less than Up. 
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2 The diagram shows normal curves for the probability 
distributions of P and Q, that each contain n values. 


a Write down a statement comparing: 
i Opa nd OQ 


ii the median value for P and the median value 
for Q 


Probability density 


iii the interquartile range for P and the 
interquartile range for Q. 


b The datasets P and Q are merged to form a new dataset denoted by W. 
i Describe the range of W. 
ii Is the probability distribution for W a normal curve? Explain your answer. 
iii Copy the diagram above and sketch onto it a curved graph representing the probability distribution 


for W. Mark the relative positions of Up, Ug and Uw along the horizontal axis. 


3 The distributions of the heights of 1000 women and | 


of 1000 men both produce normal curves, as shown. 
The mean height of the women is 160 cm and the | women men 
mean height of the men is 180 cm. 
The heights of these women and men are now 
combined to form a new set of data. Assuming 
ee that the combined heights also produce a normal 
160 180 


Probability density 


curve, copy the graph opposite and sketch onto 
it the curve for the combined heights of the . 
Height (cm) 

2000 women and men. 

4 Probability distributions for the quantity of apple 
juice in 500 apple juice tins and for the quantity of 
peach juice in 500 peach juice tins are both 
represented by norma! curves. 


apple juice 


The mean quantity of apple juice is 340 ml with 
variance 4m!’, and the mean quantity of peach juice 
is 340 ml with standard deviation 4 ml. 


Probability density 


a Copy the diagram and sketch onto it the normal 340 
curve for the quantity of peach juice in the peach Volume (ml) 
juice tins. 


b Describe the curves’ differences and similarities. 
5 The masses of 444 newborn babies in the USA and 888 newborn babies in the UK both produce 


normal curves. For the USA babies, u = 3.4kg and o = 200 g; for the UK babies, u = 3.3kg and 
o? = 36100 g?. 


a Ona single diagram, sketch and label these two normal curves. 


b Describe the curves’ differences and similarities. 
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6 The values in two datasets, whose probability distributions are both normal curves, are summarised by the 
following totals: 


Lx? = 35000, £x = 12000 and n = 5000. 
Ly? = 72000, Ly = 26000 and n = 10000. 
a Show that the centre of the curve for y is located to the right of the curve for x. 


b On the same diagram, sketch a normal curve for each dataset. 


8.2 The normal distribution 


In Section 8.1, we saw how a curved graph can be used to represent the probability 
distribution of a set of continuous data. A curved graph that represents the probability 
distribution of a continuous random variable, as stated previously, is called a probability 
density function or PDF. 


If we collect data on, say, the masses of a randomly selected sample of 1000 pineapples, we 
can produce a curved graph to illustrate the probabilities for the full and limited range of 
these masses. If there are no pineapples with masses under 0.2 kg or over 6kg, then our 
graph will indicate that P(mass < 0.2) = 0 and P(mass > 6) = 0. 


However, the continuous random variable ‘the possible mass of a pineapple’ is a theoretical 
model for the probability distribution. In the model, masses of less than 0.2 kg and 

masses of more than 6kg would be shown to be extremely unlikely, but not impossible. 

The continuous random variable would, therefore, indicate that P(mass < 0.2) > 0 and 
P(mass > 6) > 0. 


{Incidentally, the greatest ever recorded mass of a pineapple is 8.28 kg!] 


The probability distribution of a continucus random variable is a mathematical function 
that provides a method of determining probabilities for the occurrence of different 
outcomes or observations. 


If the random variable X is normally distributed with mean u and variance o°, then its 
equation is 


Ma yh? 
ee { (gN H) r for all real values of x. exp{ } means the 
ie avin P 20? number e = 2.71828 ..., 


The parameters that define a normally distributed random variable are its mean u and its raised to the power in 
variance o° the bracket, and e” > 0 


for any power p. 
To describe the normally distributed random variable Y, we write X ~N (u, o?). 


REY POINT 8.1 
-> 


ASINI (u, o?) describes a normally distributed random variable. 


The area under any 
part of the curve is 
the same, whether 
or not the boundary 
values are included. 
PB< X <7), 

The probability that X takes a value between a and b is equal to the area under the curve PB< X <7), 
between the x-axis and the boundary lines x = a and x = b. P(3 < X <7)and 
P(3 < X < 7)are 
indistinguishable. 


We read this as ‘X has a normal distribution with mean u and variance o?” 


b 
The area under the graph of y = f(x) can be found by integration: P(a = ¥ < b) = Í f(x) dx 


a 
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Unfortunately, it is not possible to perform this integration accurately but, as we will see 
later, mathematicians have found ways to handle this challenge. 


Normal distributions have many interesting properties, some of which are detailed in the 
following table. 


g 1 
Half of the values are less than the mean. P(X < uw) = Pi XY <p) =0.5 = 
Half of the values are greater than the mean. P(X >u) = PX Sp) = 0.5 The probability that 
TS a _ h fF fF the values in a normal 

Approximately 68.26% of the values lie within P(u-—o < X < u+ ø) = 0.6826 distribution lie within 
| standard deviation of the mean. a certain number of 

| Approximately 95.44% of the values lie within Plu- 20 < X < u+ 20) = 0.9544 standard deviations of 
2 standard deviations of the mean. the mean is fixed. 
Approximately 99.72% of the values lie within P(u -30 < X < u+ 30) = 0.9972 
3 standard deviations of the mean. 


In the following diagrams, the values 0, +1 and +2 represent numbers of standard 
deviations from the mean. 


0.6826 0.8413 


-101 01 
H-0 u uto eto 


We can use the curve’s symmetry, along with the table and diagrams above, to find other 
probabilities, such as: 


We know that P(-1 < X <1) = 0.6826, so 
P(X <1) = (4 x 0.6826 +0.5 = 0.8413 = P(X =- 1). 
We know that P(-2 < X <2) = 0.9544, so 


P(Y <2) =(1 x 0.9544) + 0.5 = 0.9772 = P(X =-2). 
2 


Calculated estimates of the mean and variance of the continuous random variables 
4, B and C are given in the following table. 
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Random observations from each distribution were made with the following results: 
For A: 8060 out of 13 120 observations lie in the interval from 32 to 48. 

For B: 8475 out of 12 420 observations lie in the interval from 60 to 84. 

For C: 8013 out of 10974 observations lie in the interval from 112 to 134. 


Investigate this information (using the previous table showing properties and 
probabilities for normal distributions) and comment on the statement ‘The 
distributions of A, B and C are all normal’. 


The standard normal variable Z 


There are clearly an infinite number of values for the parameters of a normally distributed 


random variable. Nevertheless, most problems can be solved by transforming the random Later in > section, 
variable into a standard normai variable, which is denoted by Z, and which has mean 0 and we will see AOW anI 
riance 1 normal variable can 
. , be transformed to 
By substituting u = 0 and o? = 1 into the equation for the normal distribution PDF, we the standard normal 
can find the equation of the PDF for Z ~ N(0, 1). This is denoted by o(z) and its equation anal Oy caine, 


is O(z) = TE exp {- ae The graph of y = 0(z) is shown below. 


The standard normal 
variable is Z ~ N(0, 1) 


o and ® are the lower 


and upper-case Greek 
letter phi. 


The mean of Z is 0. 


The axis of symmetry is a vertical line through the mean, as with every normal 
distribution. 


Z has a variance of | and, therefore, a standard deviation of 1. 


z = +1, +2 and +3 represent values that are 1, 2 and 3 standard deviations above or below 
the mean. 


Any z < 0 represents a value that is less the mean. 
Any z > 0 represents a value that is greater the mean. 
For z > 3 and for z < —3, o(z) = 0. 

The area under the graph of v = 0(z) is equal to 1. 


A vertical line drawn at any value of Z divides the area under the curve into two parts: one 
representing P(Z < z) and the other representing P(Z > z). 


The value of P(Z < z) is denoted by ®(z) and, as mentioned earlier, we do not find such 
values by integration. Tables showing the value of ®(z) for different values of z have been 
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compiled and appear in the Standard normal distribution function table at the end of the 
book. In addition, some modern calculators are able to give the value of ®(z) and the 
inverse function ®7!(z) directly. 


Although only zero and positive values of Z (i.e. z = 0) appear in the tables, the graph’s 
symmetry allows us to use the tables for positive and for negative values of z, as you will 
see after Worked example 8.2. 


Values of the standard normal variable appear as 4-figure numbers from z = 0.000 to 

z = 2.999 in the tables. The first and second figures of z appear in the left-hand column; 
the third and fourth figures appear in the top row. The numbers in the ‘ADD’ column for 
the fourth figures indicate what we should add to the value of ®(z) in the body of the table. 


D(z be found for i 1 f d be found f ; l f Critical values refer to 
(z) can be found for any given value of z, and z can be found for any given value of ®(z) probabilities of 75%, 


by using the tables in reverse (as shown in Worked example 8.4). In the critical values table, 90%, 95%, ... and their 
values for ®(z) are denoted by p. complements 25%, 10%, 
5%, ... and so on. 


A section of the tables, from which we will find the value of ®(0.274), is shown below. 


First and second figures Third figure Fourth figure 


0.0 40.5000] 0.5040 0.5080 0.5120]0.5160 0.5199 0.5239 10.5279 0.5319 0.5359 |4 8 12[/16 20 24]28 32 36 
0.5398] 0.5438 0.5478 0.5517]0.5557 0.5596 0.5636 10.5675 0.5714 0.5753 |4 8 12})16 20 24)28 32 36 
0.5793] 0.5832 0.5871 0.5910}0.5949 0.5987 0.6026 0.6103 0.6141 |4 8 12 19 23|27 31 35 
0.3 |0.617910.6217 0.6255 0.6293 [0.6331 0.6368 0.6406 [0.6443 0.6480 0.6517 |4 7 11} 15 19 22]26 30 34 


We locate the first and second figures of z (namely 0.2) in the left-hand column. 


We then locate the third figure of z (namely 7) aiong the top row... this tells us that 


(0.27) = 0.6064. PLOAN ee 


can be expressed using 
Next we locate the fourth figure of z (namely 4) at the top-right. In line with 0.6064, we see inverse notation as 


‘ADD 15’, which means that we musi add 15 to the last two figures of 0.6064 to obtain the ® (0.6079) = 0.274. 


value of ®(0.274). 


(0.274) = 0.6064 + 9.0015 = 0.6079 


WORKED EXA 


Given that Z~N(0, 1), find P(Z < 1.23) and P(Z = 1.23) 


Fi 


Answer 


®(1.23) = 0.8907 is the area to the 


1 — 0.8907 left of z= 1.23. 
= 0.1093 


1— (1 23) = 0.1093 is the area to 
the right of z = 1.23, as shown in 
the graphs. 


0 1.23 z 


0 


*, P(Z < 1.23) = 0.8907 and P(Z = 1.23) = 0.1093 
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WORKED EXAMPLE 8. 


Given that Z~ N(0, 1), find P(0.4 < Z < 1.7) correct to 3 decimal places. 


Answer 


eeeenmeoecoaecoeoeoenoesneeceaeseeed 


04 17 Z 


©(1.7) = 0.9554 and ©(0:4) = 0.6554 ss eseeeecececcvccccccce 


P(0.4<Z <1.7) = P(Z <1.7)-P(Z < 0.4) 
= (1.7) — &(0.4) 
= 0.9554 — 0.6554 
= 0.300 


As noted previously, the normal distribution function tables do not show values for z < 0. 
However, we can use the symmetry properties of the normal curve, and the fact that the 
area under the curve is equal to 1, to find values of ®(z) when z is negative. 


Situations in which z > 0, and in which z < 0, are illustrated in the two diagrams below. 


For a positive value, z = b: For a negative value, z = —a: 


The shaded area in this graph represents the value 
of O(a). 


The shaded area in this graph represents the 
value of ®(b). 


Db) =Z <b) A (a) = P(Z > -a) 
Å / 


0 b z | -a 0 5 


P(b) = P(Z = b) and 1- @(b) = P(Z = b). | (a) = P(Z > -a) and 1- @(a) = P(Z < -a). 


From the tables, the one piece of information, ®(0.11) = 0.5438, actually tells us four 
probabilities: 


P(Z <0.11) = 0.5438 and PIZ = -0.11) = 0.5438. 
P(Z = 0.11) = 1— 0.5438 = 0.4562 and P(Z <—0.11) = 1 — 0.5438 = 0.4562. 
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Information given about probabilities in a normal distribution should always be 
transferred to a sketched graph. Useful information, such as whether a particular value of 
z is positive or negative, will then be easy to see. This could, of course, also be determined 
by considering inequalities. 


If, for example, P(Z = z) > 0.5, then P(Z < z) < 0.5 and, therefore, z < 0. 


EXAMPLE 8.3 


Given that Z~N(0, 1), find P(-1 < Z < 2.115) correct to 3 significant figures. 


‘ Answer 


eeeeeeeeeeeeeeeeee8 


-1 0 2.115 
P(Z < 2.115) = ®(2.115) and P(Z < -1) = 1- @(1). 
PC-1< Z < 2.115) = ©(2.115) - [1 - (1) 
= (2.115) + (1) - 1 
= 0.9828 + 0.8413 — 1 
' = 0.824 


L > o 


WORKED EXAMPLE 8.4 


Given that Z ~ N(0, 1), find the value of a such that P(Z < a) = 0.9072. 


Answer 


0.9066 = @(1.32) eeeeeeeeeee0 


a = 0.004 eeeeeeveeeeoee 
= 1.324 


a = ©"!(0.9072) sccccccccce 
= ©!(0,9066 + 0.0006) 
= ©7!(0,9066) + 0.004 
= 1.32 + 0.004 
= 1.324 


eeeeeeneeceeee 
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WORKED EXAMPLE 8.5 


Given that Z ~ N(0, 1), find the value of b such that P(Z = b) = 9.7713. 


Answer 


eeeeeoeeoee 


0.7713 


P(Z <a) = 0.7713, so a = ©- (0.7713) sosccooocooococoooocoooooo 
= ©! (0.7704 + 0.0009) : 
= ©- (0.7704) + 0.003 
= 0.740 + 0.003 
= 0.743 
* ba ay =. 749 


EXERCISE 8B 


1 Given that Z~ N(0, 1), find the foitiowing probabilities correct to 3 significant figures. 


a P(Z < 0.567) b P(Z = 2.468) c P(Z > -1.53) d P(Z = -0.077) 
e P(Z > 0.817) f P(Z = 2.009) g P(Z <-1.75) h P(Z < -0.013) 
i P(Z < 1.96) j P(Z > 2.576) 


2 The random variabie Z is normally distributed with mean 0 and variance 1. Find the following probabilities, 
correct to 3 significant figures. 


a P(1.5< Z < 2.5) b P(0.046 < Z < 1.272) e P(1.645 < Z < 2.326) 
d P(-2.807 < Z <-1.282) e P(-1.777 < Z < 0.746) f P(-1.008 < Z < -0.337) 
g PCI2<Z<1.2) h P(-1.667 < Z < 2.667) i p(-4<z<8) 


j P(V2<Z<VJ5) 


3 Given that Z~N(0, 1), find the value of k, given that: 
a P(Z<k) = 0.9087 b P(Z < k) = 0.5442 c P(Z > k) = 0.2743 d P(Z > k) = 0.0298 
e P(Z < k)= 0.25 f P(Z < k) = 0.3552 g P(Z > k) = 0.9296 h P(Z > k) = 0.648 
i PEA<Z<k)=09128 j Pk < Z < k) = 0.6994 
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4 Find the value of c in cach of the following where Z has a normal distribution with u= 0 and o° = 1. 


a P(e < Z <1.638) = 0.2673 b Ple<Z< 2.878) = 0.4968 


c P(l<Z<c) = 0.1408 


P(0.109 < Z < c) = 0.35 


d 
e P(c< Z <2) = 0.6687 f P(e<Z< 1.85) = 0.9516 
h 


g PC1.221< Z < c) = 0.888 
i PC2.63 < Z < c) = 0.6861 j 


P(-0.674 < Z < c) = 0.725 
P2.7 < Z < c) = 0.0252 


Standardising a normal distribution 


The probability distribution of a normally distributed random variable is represented by a 
normal curve. This curve is centred on the mean 4; the area under the curve is equal to I, 
and its height is determined by the standard deviation o. 


We already have a method for finding probabilities involving the standard normai variable 
Z ~ N(O, 1) using the norma! distribution function tables. Fortunately, this same set of 
tables can be used to find probabilities involving any normal random variable, no matter 
what the values of u and g?. Although we have only learnt about coding data, it turns out 
that coding works in exactly the same way for normally distributed random variables: they 
behave in the way that we expect and remain normal after coding. 


If we code X by subtracting u, then the PDF is translated horizontally by —u units and is 
now centred on 0. The new random variable X — u has mean 0 and standard deviation o. 


If we now code X — u by multiplying by n (i.e. dividing by o) then the standard 

deviation (and variance) will be equal to 1, while the mean remains 0. 

X-u 
o 


Coding the random variable X in this way is called standardising, because it transforms the 
distribution Y ~ N(u, o?) to Z~N(0, 1). 


FS) 


When X ~ N(u, o°) then Z = ie has a standard normal distribution. 


= 


The coded random variable is normally distributed with mean 0 and variance 1. 


A standardised value z = ~—* tells us how many standard deviations x is from the mean. 
o 


Probabilities involving values of X are equal to probabilities involving the corresponding 
values of Z, which can be found from the normal distribution function tables for 
Z~N(0,1). 


For example, if ¥ ~ N(20, 9), then P(Y < 23) = P (z < 23-20 ) 


V9 
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In Chapter 2, Section 
2.2 and in Chapter 3, 
Section 3.3, we saw 
how the coding of data 
by addition and/or 
multiplication affects 
the mean and the 
standard deviation. 


We will learn more 
about coding 

random variables 

in the Probability & 
Statistics 2 Coursebook, 


Chapter 3. 


In the table showing 
properties and 
probabilities of normal 
distributions prior to 
Explore 8.3, we saw 
that probabilities are 
determined by the 
number of standard 
deviations from the 
mean. The properties 
given in that table 
apply to all normal 
random variables. 
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WORKED EXAMPLE 8.6 


Given that X ~N(ii, 25), find P(X < 18) correct to 3 significant figures. 


Answer 
18-11 mA 
z= -F =1.4 peeeesveee dard 
75 me IA? 
P(X < 18) = P(Z < 1.4) 
= ©(1.4) 


| 
| = 0.919 


WORKED EXAMPLE 8.7 


Given that X ~ N(20, 7), find P(X < 16.6) correct to 3 significant figures. 


Answer 
16.6 — 20 z 
z = — = -1.285 soooooooo o mre 
JT 97 
P(X <16.6) = P(Z S— 1.285) 
= 1 — (1.285) 


0.0994 


WORKED EXAMPLE 8.8 


Given that X ~ N(5, 5), find P(2 < X < 9) correct to 3 significant figures. 
Answer 
2 5 9 x 
1.342 0 1.789 i 
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Area to the right of z = 0 is sooooooooo o IRCIORO 
@(1.789) — B(0) = (1.789) — 0.5 Where possible, always 
=4M633 use a 4-figure value 
for z. 
Area to the left of z = 0 is 
(0) — P1342) = 0.5 — [1 - (1.342) ] 
= 0.4102 


Total area = 0.4633 + 0.4102 = 0.8735 «°° e o Urysob 
<. P(2 = X < 9) = 0.874 


ee 


Some useful results from previous worked examples are detailed in the following graphs. 


For 0<a<5 For -a< 0 < b For -a < 0 <a 
A 
| 
/ 
202 X 
0a b Z -a 0 b z —a 0 a Z 
P(a < Z < b) = ®(b) — D(a) P-a < Z < b) = ®(b) + ®(a)-1 PCa < Z < a) =2®@(a)-1 


WORKED EXAMPLE 8.9 


Given that Y ~ N(u,o?). P(Y < 10) = 0.75 and P(Y = 12) = 0.1, find the values of u and o. 
Answer 


P(Y 212) < 0.5. so P(Y < 12) > 0.5, which means that 12 > u. 


P(Y <10) > 0.5, which means also that 10 > u. 


0 Zp Z 
Sots 0.674 gives 10 — u = 0.6740 ...... [1] SERS SSeS Se 
eat 1.282 gives 12 — u = 1.2820...... [2] 
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12 — u = 1.2820 [2] eeheaseeasesensaaesemneen 
10 — u = 0.6746 [1] 
2 = 0.6086 


s. o = 3:29 and u= 7.78. 


EXERCISE 8C 


1 Standardise the appropriate value(s) of the normal variable X represented in each diagram, and find the 
required probabilities correct to 3 significant figures. 


a Find P(X <11), given that X ~ N(8, 25). 


8 11 x 
eee Z 
b Find P(X < 69.1), given that X ~ N(72,11). 
69.1 72 xX 
sat: 0 á 


From this point in the exercise, you are strongly advised to sketch a diagram to help answer each question. 
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2 Calculate the required probabilities correct to 3 significant figures. 
a Find P(X <9.7) and P(X > 9.7), given that X ~ N(6.2, 6.25). 
b Find P(X <5) and P(X > 5), given that X ~ N(3, 49). 
c Find P(X > 33.4) and P(X < 33.4), given that X ~ N(37, 4). 
d Find P(X < 13.5) and P(X = 13.5), given that X ~ N(20, 15). 
e Find P(X > 91) and P(X < 91), given that X ~ N(80, 375). 
f Find P(1< X < 21), given that X ~ N(i!, 25). 
g Find P(2< X <5), given that X ~ N(3, 7). 
h Find P(6.2 = X = 8.8), given that X ~ N(7, 1.44). [Read carefully.] 
i Find P(26 < X < 28), given that X ~N(25, 6). 
j Find P(8 < X < 10), given that X ~ N(12, 2.56). 
3 a Finda, given that ¥ ~ N(30, 16) and that P(X < a) = 0.8944. 
b Find b, given that X ~ N(12, 4) and that P(X < b) = 0.9599. 
c Find, given that X ~N(23, 9) and that P(X > c) = 0.9332. 
d Find d, given that X ~ N(17, 25) and that P(X > d) = 0.0951. 
e Finde, given that X ~ N(100, 64) and that P(X > e) = 0.95. 
4 a Find f, given that X ~N(10,7) and that P(f < X < 13.3) = 0.1922. 
b Find g, given that X ~ N(45, 50) and that P(g < X < 55) = 0.5486. 
c Find h, given that X ~ N(7, 2) and that P(8 < X < h) = 0.216. 
d Find j, given that X ~ N(20, 11) and that P(j = X < 22) = 0.5. 
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5 X is normally distributed with mean 4 and variance 6. Find the probability that X takes a negative value. 
6 Given that Y~N ( i, 4 1°) where u > 0, find P(X < 2p). 

7 If T~N(10, o7) and P(T > 14.7) = 0.04, find the value of o. 

8 Itis given that V~N(u,13) and P(V < 15) = 0.75. Find the value of u. 

9 The variable W ~N(u, o°). Given that u = 40 and P(W < 83) = 0.95, find the value of u and of o. 

10 X has anormal distribution in which o = u -— 30 and PLY >= 12) = 0.9. Find the value of u and of o. 


11 The variable Ọ ~ N(u,07). Given that P(Q < 1.288) = 0.281 and P(Q < 6.472) = 0.591, find the value of u 
and of o, and calculate the P(4 <Q < 5). 


12 For the variable V ~ N(u, 07), it is given that P(V < 8.4) = 0.7509 and P(V > 9.2) = 0.1385. 
Find the value of u and of o, and calculate PV < 10). 


13 Find the value of 4 and of © and calculate P(W > 6.48) for the variabie W ~ N(u, 07), given that 
P(W = 4.75) = 0.6858 and P(W < 2.25) = 0.0489. 


14 X has a norma! distribution, such that P(X > 147.0) = 0.0136 and P(X < 59.0) = 0.0038. 


Use this information to calculate the probability that 80.0 = X < 130.0. 
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8.3 Modelling with the normal distribution 


The German mathematician Carl Friedrich Gauss showed that measurement errors made 
in astronomical observations were well modelled by a normal distribution, and the Belgian 
statistician and sociologist Adolphe Quételet later applied this to human characteristics 
when he saw that distributions of such things as height, weight, girth and strength were 
approximately normal. 


We are now in a position to apply our knowledge to reai-life situations, and to solve more 
advanced problems involving the normal distribution. 


v 
WORKED EXAMPLE 8.10 


The mass of a newborn baby in a certain region is normally distributed with mean 
3.35 kg and variance 0.0858 kg”. Estimate how many of the 1356 babies born last year 
had masses of less than 3.5kg. 


Answer 
35-235 We cannot know 
az) = 0[ ae | eoccccecces the exact number of 
$ newborn babies from 
= (0.512) the model because it 
= 0.6957 only gives estimates. 
j However, we do know 

P(mass < 3.5kg) = 0.6957 that the number of 
69.57% of 1356 = 943.3692 babies must be an 


integer. 


A factory produces half-litre tins of oil. The volume of oil in a tin is normally 
distributed with mean 505.1§ ml and standard deviation 2.96 ml. 


a What percentage of the tins contain less than half a litre of oii? 


b Find the probability that exactly 1 out of 3 randomly selected tins contains less 
than halfa litre of oil. 


Answer 


500 — 506.18 
z= = 2! EELEE, 
2.96 i 


TELESE EEE) 


500 506.18 
-2.088 0 


N X 
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b 


P(X < 500) = P(AKA2.088) 


= 1— &(2.088) 
= — 0.9816 
= 0.0184 


<. 1.84% of the tins contain less than half a litre of oil. 


Xog i A probability obtained 
P(Y = 1) = 3 x 0.0184! x 0.98162 Let the discrete random from a normal 
1 variable Y be the number of tins distributioncan bé 
= 0.0532 containing less than half a litre used as the parameter 


of oil, then Y ~ B(3, 0.0184). p ina binomial 


| distribution. 


EXERCISE 8D 


1 


D 


The length of a bolt produced by a machine is normally distributed with mean 18.5cm and variance 0.7 cm?. 


1 


Find the probability that a randomly selected bolt is less than 18.85cm long. 


The waiting times, in minutes, for patients at a clinic are normally distributed with mean 13 and variance 16. 
a Calculate the probability that a randomly selected patient has to wait for more than 16.5 minutes. 


b Last month 468 patients attended the clinic. Calculate an estimate of the number who waited for less than 
9 minutes. 


Tomatoes from a certain producer have masses which are normally distributed with mean 90 grams and 
standard deviation 17.7 grams. The tomatoes are sorted into three categories by mass, as follows: 


Small: under 80 g; Medium: 80 g to 104g; Large: over 104g. 
a Find, correct to 2 decimal places, the percentage of tomatoes in each of the three categories. 


b Find the value of k such that P(k = X < 104) = 0.75, where X is the mass of a tomato in grams. 


The heights, in metres, of the trees in a forest are normally distributed with mean u and standard deviation 3.6. 
Given that 75% of the trees are less than 10m high, find the value of u. 


The mass of a certain species of fish caught at sea is norimaily distributed with mean 5.73 kg and variance 
2.56kg". Find the probability that a randomly selected fish caught at sea has a mass that is: 


a less than 6.0kg b more than 3.9kg c between 7.0 and 8.0 kg 


The distance that children at a large schoo! can hop in 15 minutes is normally distributed with mean 199m 
and variance 3700 m?. 


a Calculate an estimate of b, given that only 25% of the children hopped further than b metres. 


b Find an estimate of the interquartile range of the distances hopped. 


The daily percentage change in the value of a company’s shares is expected to be normally distributed with 
mean 0 and standard deviation 0.51. On how many of the next 365 working days should the company expect 
the value of its shares to fall by more than 1%? 


The masses, w grams, of a large sample of apples are normally distributed with mean 200 and variance 169. 
Given that the masses of 3413 apples are in the range 187 < w < 213, calculate an estimate of the number of 
apples in the sample. 
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The ages of the children in a gymnastics club are normally distributed with mean 15.2 years and standard 
deviation o. Find the value of o given that 30.5% of the children are less than 13.5 years of age. 


The speeds, in kmh, of vehicles passing a particular poini on a rural road are normally distributed with 
mean /! and standard deviation 20. Find the value of 4 and find what percentage of the vehicles are being 
driven at under 80 kmh, given that 33% of the vehicles are being driven at over 100 kmh. 


Coffee beans are packed into bags by the workers on a farm, and each bag claims to contain 200g. The actual 
mass of coffee beans in a bag is normally distributed with mean 210 g and standard deviation øo. The farm 
owner informs the workers that they musi repack any bag containing less than 200 g of coffee beans. Find the 
value of o, given that 0.5% of the bags must be repacked. 


Colleen exercises at home every day. The length of time she does this is normally distributed with mean 12.8 
minutes and standard deviation o. She exercises for more than 15 minutes on 42 days in a year of 365 days. 

a Calculate the value of ø. 

b On how many cays ina year would you expect Colleen to exercise for less than 10 minutes? 

The times taken by 15-year-olds to solve a certain puzzle are normally distributed with mean u and standard 
deviation 7.42 minutes. 

a Find the value of 4, given that three-quarters of ali 15-year-olds take over 20 minutes to solve the puzzle. 

b Calculate an estimate of the value of n, given that 250 children in a random sample of n 15-year-olds fail to 


solve the puzzle in less than 30 minutes. 


The lengths, X cm, of the leaves of a particular species of tree are normally distributed with mean and variance o°. 


a Find PQu-o<¥X < u+0). 


b Find the probability that a randomly selected leaf from this species has a length that is more than 
2 standard deviations from the mean. 


c Find the value of u and of ø, given that P(X < 7.5) = 0.75 and P(X < 8.5) = 0.90. 

The time taken in seconds for Ginger’s computer to open a specific large document is normally distributed 
with mean 9 and variance 5.91. 

a Find the probability that it takes exactly 5 seconds or more to open the document. 

b Ginger opens the document on her computer on n occasions. The probability that it fails to open in less 
than exactly 5 seconds on at least one occasion is greater than 0.5. Find the least possible value of n. 
The masses of all the different pies sold at a market are normally distributed with mean 400 g and standard 

deviation 61g. Find the probability that: 

a the mass of a randomly selected pie is less than 425 g 

b 4randomly selected pies all have masses of less than 425 g 

c exactly 7 out of 10 random!y selected pies have masses of less than 425 g. 

The height of a female university student is normally distributed with mean 1.74m and standard deviation 
12.3cm. Find the probability that: 

a arandomly selected female student is between 1.71 and 1.80 metres tall 

b 3randomly selected female students are all between 1.71 and 1.80 m tall 


c exactly 15 out of 50 randomly selected female students are between 1.71 and 1.80 metres tall. 
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8.4 The normal approximation to the binomial distribution 


In Chapter 7, Section 7.1, we saw that the binomial distribution can be used to solve 
problems such as ‘Find tie probability of obtaining exactly 60 heads with 100 tosses of a 
100 
60 
of obtaining 60 or more heads, we must find the probability for 60 heads, for 61 heads, for 
62 heads and so on, and add them all together. 


fair coin’, and that this is equal to x 0.5% x 0.5%, Therefore, to find the probability 


Imagine how long it took to calculate binomial probabilities before calculators and computers! 


However, in certain situations, we can approximate a probability such as this by a method 
that involves far fewer calculations using the normal distribution. 


EXPLORE 8.4 


Binomial probability distributions for 2, 4, and 12 tosses of a fair coin are shown in 
the following diagrams. Notice that, as the number of coin tosses increases, the shape 
of the probability distribution becomes increasingly normal. 


p=05,0=2 04 p=05,n=4 0.25 p=05,n= 12 


0 2 4 6 8 10 12 


| Does the binomial probability distribution maintain its normal shape for large 
values of n when p varies? Find out using the Binomial Distribution resource on the 
GeoGebra website. 


Select any n => 20 then use the pause/play button or the slider to vary the value of p. Take 
note of when the distribution loses its normal shape. Repeat this for other values of n. 


Can you generalise as to when the binomial distribution begins to lose its normal shape? 


consultani, was often asked to make long calculations 
concerning games of chance. He noted that when the 
number of events increased, the shape of the binomial 
distribution approached a very smooth curve, and saw that 
he would be able to solve these long calculation problems 
if he could find a mathematical expression for this curve: 
this is exactly what he did. The curve he discovered is now 
called the normal curve. 


Abraham de Moivre, the 18th century statistician and 
i 
| 


Before the late 1870s, when the term normal was 
coined independently by Peirce, Galion and Lexis, this 
distribution was known — and still is by some — as the 
Gaussian distribution after the German mathematician 
Carl Friedrich Gauss. The word normal is not meant to 


je Moivy A. 
suggest that all other distributions are abnormal! AUN NRE 1067-1734 
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de Moivre’s theorem, 
(cos +isinĝ)” = 
cosn@+isinn@ 

links trigonometry with 
complex numbers — 

a topic that we cover in 
the Pure Mathematics 2 


& 3 Coursebook, 
Chapter 11. 


Chapter 8: The normal distribution 


The following diagrams show the shapes of four binomial distributions for n = 25. 


p=0.15, q = 0.85 p= 0.35, q = 0.65 p= 0.75, q = 0.25 


p= 0.95, q = 0.05 


0 25 0 25 0 25 0 
np = 3.75, ng = 21.25 np = 8.75, nq = 16.25 np = 18.75, nq = 6.25 


As you can see, the binomial distribution loses its normal shape when p is small and also 
when q is small. 


A more detailed investigation shows that the binomial distribution has an approximately 
normal shape if np and nq are both greater than 5. These are the values that we use to 
decide whether a binomial distribution can be well-approximated by a norma! distribution. 
The larger the values of np and ng, the more accurate the approximation will be. As we 
can see from the above diagrams, the approximation is adequate (but not very good) when 
np = 18.75 and ng = 6.25. 


The distribution X ~ B(40, 0.9) cannot be well-approximated by 2 normal distribution 
because ng < 5. 


The distribution X ~ B(250, 0.2) can be well-approximated by a normal distribution 
because np = 50 and ng = 200, both of which are substantially greater than 5. 


When we approximate a discrete distribution by a continuous distribution, a discrete value 
such as X = 13 must be treated as being represented by the class of continuous values 
12.5< X < 13.5. For this reason, X = 13 must be replaced by either X = 12.5 or by 

X = 13.5 in our probability calculations. Making this replacement is known as ‘making a 
continuity correction’. Deciding whether to use X = 12.5 or X = 13.5 depends on whether 
or not X = 13 is included in the probability that we wish to find. 


For example, if we wish to find P(X < 13), where X = 13 is not included, we calculate 
using X = 12.5. 


If we wish to find P(X < 13), where X = 13 is included, we calculate using X = 13.5. 


Further details of continuity corrections are given in Worked example 8.12. 


WORKED EXAMPLE 8.12 


Given that X ~B(100, 0.4), use a suitable approximation and continuity correction 
to find: 


a P(X < 43) 


b P(X > 43) 


Copyright Material - Review Only - Not for Redistribution 


O) KEY POINT 8.5 


25 


np = 23.75, nq = 1.25 


X ~B(n, p) can be 
approximated by 
N(u, o?) , where 


u = np and 0° = npq, 
provided that n is large 
enough to ensure that 
np >5 anding = 9. 


Continuity corrections 
must be made when a 
discrete distribution 


is approximated 
by a continuous 
distribution. 
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Answer 


u=np= 40 anda? = npg = 24. » e o 
X ~B(100, 0.4) can be 
approximated by N(40, 24). 


D ; 
a Pie, ERAR n 
© I l i 
2 i pt | ; 
E = gg pgg 

1 1 
po 1 I I 1 
me) 
£ rPige ds: 

42 fl 43 t 44 x 
42.5 43.5 


Possible continuity corrections for a discrete value of 43 are given below: 


For P(X < 43), we would use the lower boundary value 42.5...... [part a] 
For P(X < 43), we would use the upper boundary value 43.5 


For P(X > 43), we would use the upper boundary value 43.5 ...... [part b] 
For P(X = 43), we would use the lower boundary value 42.5 


a P(X < 43) = P(Z < 0.510) 
= 0(0.510) ° o o o PHONES 

= 0.6950 
“. P(Y < 43) = 0.695 


b P(X > 43) = P(Z > 0.714) 
=1- (0.714) «°° Forx 

= 0.2377 
. P(X > 43) ~ 0.238 


J 
WORKED EXAMPLES: 3 


Boxes are packed with 8000 randomly selected items. It is known that 0.2% of the 
items are yellow. 


Find, using a suitable approximation, the probability that: 
a aboxcontains fewer than 20 yellow items 


b exactly 2 out of 3 randomly selected boxes contain fewer than 20 yellow items. 
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We will also make 
continuity corrections 
when using the normal 
distribution as an 
approximation to the 
Poisson distribution 

in the Probability 

& Statistics 2 
Coursebook, 

Chapter 2. 


X <a means ‘X is 
fewer/less than a’. 


X >a means ‘X is 
more/greater than a’. 


X <a means ‘X is at 
most a’ and ‘X is not 
more than a’ and ‘X is 
a or less’. 


X 2a means ‘X isat 
least a’ and ‘X is not 
less than a’ and ‘X isa 
or more’. 


Chapter 8: The normal distribution 


Answer 
CA 
a X~B(8000,0.002) »».».»sessee s iM EA 
O © 
X ~ B(8000, 0.002) can be aed 


approximated by N(16, 15.968). 


19.5—16.) 
J 15.568 | 
=P(Z < 0.87497...) 
= (0.876) 
= 0.8094 
.. The probability that a box 
contains fewer than 20 yellow items 


is approximately 0.809. Although our answer 
to part a is only an 


Do not forget to 
make the continuity 
correction! 


3 PA approximation, we 
bh P(¥ =2)= » }. 0.8094? x 0.1906! . « 4 should not use a 
rounded probability, 
= 0.375 a such as 0.8, in further 


Px <20)=P(Z < 
| 


calculations. 


WORKED EXAMPLE 8.14 


A fair coin is tossed 888 times. Find, by use of a suitable approximation, the probability that the coin lands 
heads-up at most 450 times. 


Answer 
P 
Ag. 


X ~ B(888, 0.5) can be approximated 


4 


by N(444, 222). ee eeeeeeeeeoccrecce P. 
7, 
Sm To find POY = 450) we cakeulate with x54505, 
P( Yr =4 =P Z E eeeeoeee 
oo ( 22 ) 

= P(Z =< 0.436) 
= (0.436) 
= 0.6686 


“. P(at most 450 heads) = 0.669 
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EXPLORE 8.5 


By visiting the Binomial and Normal resource on the GeoGebra website you will get 
a clear picture of how the normal approximation to the binomiai distribution works. 


Select values of n and p so that np and n(1— p) are both greater than 5. 


The binomial probability distribution is displayed with an overlaid normal curve 
(the value of u = np and of o =./np(1— p) are displayed in red at the top-right). If 
you then check the probability box, adjustable values of x = a and x = b appear on the 
diagram, with the area between them shaded. Remember that a discrete variable is 


being approximated by a continuous variable, so appropriate continuity corrections 


| are needed to find the best probability estimates. 
EXERCISE 8E 


1 


Decide whether or not each of the following binomial distributions can be well-approximated by a normal 
distribution. 


For those that can, state the values of the parameters u and o°. 

For those that cannot, state the reason. 

a B(20, 0.6) b B(30, 0.95) c B(40, 0.13) d B(50, 0.06) 
Find the smallest possible value of n for which the following binomial distributions can be well-approximated 
by a normal distribution. 


a Bn, 0.024) b R(w, 0.15) c Bin, 0.52) d Bn, 0.7) 
Describe the binomial distribution that can be approximated by the normal distribution N(14, 10.5). 


By first evaluating np and vpq, use a suitable approximation and continuity correction to find P(X < 75) for 
the discrete random variable X ~ B(100, 0.7). 


The discrete random variable Y ~ B(S0, 0.6). Use a suitable approximation and continuity correction to find 
P(Y > 26). 


A biased coin is tossed 160 times. The number of heads obtained, H, follows a binomial distribution where 
E(A) = 100. Find: 

a the value of p and the variance of H 

b the approximate probability of obtaining more than 110 heads. 

One card is selected at random from each of 40 packs. Each pack contains 52 cards and includes 13 clubs. Let 
C be the number of clubs selected from the 40 packs. 

a Show that the variance of C is 7.5. 


b Obtain an approximation for the value of P(C < 8), and justify the use of this approximation. 
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Chapter 8: The normal distribution 


8 Ina large survey, 55% of the people questioned are in full-time employment. In a random sample of 80 of 
these people, find: 


a the expecied number in full-time employment 
b the standard deviation of the number in full-time employment 
c the approximate probability that fewer than half of the sample are in full-time employment. 
9 A company manufactures rubber and plastic washers in the ratio 4:1. The washers are randomly packed into 
boxes of 25. 
a Find the probability that a randomly selected box contains: 
i exactly 21 rubber washers ii exactly 10 plastic washers. 


b A retail pack contains 2000 washers. Find the expectation and variance of the number of rubber washers 
in a retail pack. 


c Using a suitable approximation, find the probability that a retail pack contains at most 1620 rubber 
washers. 
10 In a certain town, 63% of homes have an internet connection. 
a Ina random sample of 20 homes in this town, find the probability that: 
i exactly 15 have an internet connection 
ii exactly nine do not have an internet connection. 


b Use a suitable approximation to find the probability that more than 65% of a random sample of 600 homes 
in this town have an internet connection. 


11 17% of the people interviewed in a survey said they watch more than two hours of TV per day. A random 
sample of 300 of those who were interviewed is taken. Find an approximate value for the probability that at 
least one-fifth of those in the sample watch more than two hours of TV per day. 


12 An opinion poll was taken before an election. The table shows the percentage of voters who said they would 
vote for parties A, B and C. 


36 41 23 


Find an approximation for the probability that, in a random sample of 120 of these voters: 
a exactly 50 said they would vote for party B 
b more than 70 but fewer than 90 said they would vote for party B or party C. 
13 Boxes containing 24 floor tiles are loaded into vans for distribution. In a load of 80 boxes there are, on 
average, three damaged floor tiles. Find, approximately, the probability that: 
a there are more than 65 damaged tiles in a load of 1600 boxes 


b in five loads, each containing 1600 boxes, exactly three loads contain more than 65 damaged tiles. 


14 It is known that 2% of the cheapest memory sticks on the market are defective. 


a Ina random sample of 400 of these memory sticks, find approximately the probability that at least five but 
at most |i are defective. 


b Ten samples of 400 memory sticks are tested. Find an approximate value for the probability that there are 
fewer than 12 defective memory sticks in more than seven of the samples. 
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15 Randomly selected members of the public were asked whether they approved of plans to build a new sports 
centre and 57% said they approved. Find approximately the probability that more than 75 out of 120 people 
said they approved, given that at least 60 said they approved. 


16 A fair coin is tossed 400 times. Given that it shows a head on more than 205 occasions, find an approximate 
value for the probability that it shows a head on fewer than 215 occasions. 


17 An ordinary fair die is rolled 450 times. Given thai a 6 is rolled on fewer than 80 occasions, find 
approximately the probability that a 6 is rolled on at least 70 occasions. 


© XO 
e e R e 
gNthecklist of learning and rstanding 
N A continuous random variable can take any value, possibly within a range, and those values occur 
os by chance in a certain random manner. 


The probability distribution of a continuous random variable is represented by a function called a 
probability density function or PDF. 


A normally distributed random variable X is described by its mean and variance as X ~ N(u, 07). 

The standard normai random variable is Z ~ N(0, 1). 

When Y ~ N(ui, o?) then Z = X-u has a standard normal distribution, and the standardised value 
o 


z= —# tells us how many standard deviations x is from the mean. 


oj 
X ~ B(n, p) can be approximated by N(u, o°), where u = i:7 and o° = npq, provided that n is large 
214 enough to ensure that np > 5 and nq > 5. 


@ np>5and nq > 5 are the necessary conditions for making this approximation, and larger values 
of np and ng result in better approximations. 


RVA Continuity corrections must be made when a discrete distribution is approximated by a continuous 
distribution. 


Chapter 8: The normal distribution 


1 Acontinuous random variable, X, has a normal distribution with mean 8 and standard deviation o. 
Given that P(X > 5) = 0.9772, find P(X < 9.5). [3] 


2 The variable Y is normally distributed. Given that 100 = 3u and P(Y < 10) = 0.75, find P(Y = 6). [4] 


© 3 In Scotland, in November, on average 80% of days are cloudy. Assume that the weather on any one day is 
independent of the weather on other days. 


i Use anormal approximation to find the probability of there being fewer than 25 cloudy days in 
Scotland in November (30 days). [4] 


ii Give a reason why the use of a normal approximation is justified. [1] 
Cambridge International AS & A Level Mathematics 9709 Paper 62 Q2 June 2011 


4 Ata store, it is known that 1 out of every 9 customers uses a gift voucher in part-payment for purchases. 
A randomly selected sample of 72 customers is taken. Use a suitable approximation and continuity 
correction to find the probability that at most 6 of these customers use a gift voucher in part-payment for their 
purchases. [5] 


5 Asurvey shows that 54% of parents believe mathematics tc be the most important subject that their 
children study. Use a suitable approximation to find the probability that at least 30 out of a sample of 
50 parents believe mathematics to be the most important subject studied. [5] 


6 Two normally distributed continuous random variables are Y and Y. It is given that ¥ ~ N(1.5, 0.27) and 
that Y ~ N(2.0, 0.57). On the same diagram, sketch graphs showing the probability density functions of X 
and of Y. Indicate the line of symmetry of each clearly labelled graph. [3] 


© 7 The random variable X is such that X ~ N(82, 126). 


i A value of X is chosen at random and rounded to the nearest whole number. Find the probability that 


this whole number is 84. [3] 
ii Five independent observations of X are taken. Find the probability that at most one of them is greater 

than 87. [4] 
iii Find the value of k such that P(87 < X < k) = 0.3. [5] 
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© 8 a A peiro! station finds that its daily sales, in litres, are normally distributed with mean 4520 and standard 
deviation 560. 


i Find on how many days of the year (365 days) the daily sales can be expected to exceed 3900 litres. [4] 


The daily sales at another petrol station are X litres, where X is normally distributed with mean m and 
standard deviation 560. It is given that P(X > 8000) = 0.122. 


ii Find the value of m. [3] 


iii Find the probability that daily sales at this petrol station exceed 8000 litres on fewer than 2 of 
6 randomly chosen days. [3] 


b The random variable Y is normally distributed with mean u and standard deviation o. Given that 
o= 2 H, find the probability that a random value of Y is less than 2u. [3] 
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9 V andW are continuous random variables. V ~ N(9, 16) and W ~ N(6, o? ). Find the value of o, 
given that PW < 8)=2xP(V <8). [4] 
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10 The masses, in kilograms, of ‘giant Botswana cabbages’ have a normal distribution with mean u and 
standard deviation 0.75. It is given that 35.2% of the cabbages have a mass of less than 3kg. Find the 
value of u and the percentage of cabbages with masses of !ess than 3.5kg. [5] 


11 The ages of the vehicles owned by a large fleet-hire company are normally distributed with mean 43 months 
and standard deviation o. The probability that a randomly chosen vehicle is more than 4 7 years old is 0.28. 
Find what percentage of the company’s vehicles are less than two years old. [5] 


e 12 The weights, X grams, of bars of soap are normally distributed with mean 125 grams and standard 
deviation 4.2 grams. 


i Find the probability that a randomly chosen bar of soap weighs more than 128 grams. [3] 
ii Find the value of k such that P(k < X < 128) = 0.7465. [4] 


iii Five bars of soap are chosen at random. Find the probability that more than two of the bars each 
weigh more than i28 grams. [4] 
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(Ps) 13 Crates of tea should contain 200 kg, but it is known that 1 ovt of 45 crates, on average, is underweight. 
A sample of 530 crates is selected at random. 


a Find the probability that more than 12 but fewer than 17 crates are underweight. [5] 


b Given that more than 12 but fewer than 17 crates are underweight, find the probability that more than 


14 crates are underweight. [5] 
at (Ps) 14 Once a week, Haziq rows his boat from the island where he lives to the mainland. The journey time, 
X minutes, is normally distributed with mean 4 and variance o°. 
a Given that P(20 < ¥ < 30) = 0.32 and that P(X < 20) = 0.63, find the values of u and 07. [4] 


b The time taken for Haziq to row back home, Y minutes, is normally distributed and P(Y < 20) = 0.6532. 
Given that the variances of X and Y are equal, calculate: 


i the mean time taken by Haziq to row back home [3] 
ii the expected number of days over a period of five years (each of 52 weeks) on which Haziq takes more 


than 25 minutes to row back home. [3] 


(Ps) 15 The time taken, T seconds, to open a graphics programme on a computer is normally distributed with 
mean 20 and standard deviation o. 


Given that P(T > 13 | T < 27) = 0.8, find the value of: 
ao [5] 
b k for which P(T > k) = 0.75. [3] 


D 16 A law firm has found that their assistants make, on average, one error on every 36 pages that they type. 
A random sample of 90 typed documents, with a mean of 62 pages per document, is selected. Given 
that there are more than 140 typing errors in these documents, find an estimate of the probability that 
there are fewer than 175 typing errors. [6] 
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CROSS-TOPIC REVIEWE EK @ise3 | 
= à o 


1 


a The following table shows the probability distribution for the random variable X. 


0 1 2 3 
i 2 = 1 
k 10 20 20 


i Show that k =2. 
ii Calculate E(X) and Var(X). 
iii Find the probability that two independent observations of X have a sum of less than 6. 


b The following table shows the probability distribution for the random variable Y. 


0 1 2 3 


0.1 0.2 0.3 0.4 
We 


If one independent observation of each random variable is made, find the probability that X +Y =3. 


The random variable Y has a geometric distribution such that ET = ae Find P(X <3). 


The variable X has a normal distribution with mean // and standard deviation o. Given that 
P(X < 32.83) = 0.834 and that P(X = 27.45) = 0.409, find the value of u and of o. 


The length of time, in seconds, that it takes to transfer a photograph from a camera to a computer can be 
modelled by a normal distribution with mean 4.7 and variance 0.7225. Find the probability that a 
photograph can be transferred in less than 3 seconds. 


The mass of a berry from a particular type of bush is normally distributed with mean 7.08 grams and 
standard deviation o. It is known that 5% of the berries have a mass of exactly 12 grams or more. 


a Find the value of o. 
b Find the proportion of berries that have a mass of between 6 and 8 grains. 


The time taken, iu minutes, to fit a new windscreen to a car is normelly distributed with mean u and 


standard deviation 16.32. Given that three-quarters of all windscreens are fitted in less than 45 minutes, find: 


a the value of u 


b the proportion of windscreens that are fitted in 35 to 40 minutes. 


[2] 
[3] 
[2] 


[3] 
[3] 


[4] 


[3] 


[3] 
[3] 


[3] 
[3] 


The mid-day wind speed, in knots, at a coastal resori is normally distributed with mean 12.8 and standard deviation o. 


a Given that 15% of the recorded wind speeds are less than 10 knots, find the value of o. 


b Calculate the probability that exactly two out of 10 randomly selected recordings are less than 10 knots. 


c Using a suitable approximation, calculate an estimate of the probability that at least 13 out of 100 randomly 


selected recordings are over 15.5 knots. 


Copyright Material - Review Only - Not for Redistribution 


[3] 
[3] 


[4] 


ww 


N 
Cambridge International AS & A Levelffathematics: Probability & Statistics 1 


10 


11 


12 


13 


A technical manva! contains 10 pages of text, 7 pages of diagrams and 3 pages of colour illustrations. Four different 
pages are selected at random from the manual. Let X be the number of pages of colour illustrations selected. 


a Draw up the probability distribution table for X. [4] 
b Find: 
i E(X) [2] 


ii the probability that fewer than two pages with colour illustrations are selected, given that at least one 
page with colour illustrations is selected. [2] 


A fair six-sided die is numbered 1, 1, 2, 3,5 and 8. The die is rolled twice and the two numbers obtained 
are added together to give the score, X. 


a Find E(X). [4] 
b Given that the first number rolled is odd, find the probability that Y is an even number. [2] 


The following table shows the probability distribution table for the random variable Q. 


Y 1 2 3 
x-2 a x—3 
x+1 18 x+4 
a Find the value of x. [3] 
b Evaluate Var(Q). [2] 


Research shows that 17% of children are absent from school on at least five days during winter because of ill 
health. A random sample of 55 children is taken. 


a Find the probability that exactly 10 of the children in the sample are absent from school on at least five days 
during winter because of ili health. [2] 


b Use a suitable approximation to find the probability that at most seven children in the sample are absent from 
school on at least 5 days during winter because of ill health. [4] 


c Justify the approximation made in part b. [1] 
The ratio of adult males to adult females living in a certain town is 17 : 18, and : of these adults, independent of 


gender, do not have a driving license. 


a Show that the probability that a randomly selected adult in this town is male and has a driving license is 


17 
] to —. 1 
equal to 7 [1] 
b Find the probability that, in a randomly selected sample of 25 adults from this town, from 8 to 10 inclusive are 
females who have a driving license. [4] 


A fair eight-sided die is numbered 2, 2, 3, 3, 3, 4, 5 and 6. The die is rolled up to and including the roll on which 
the first 2 is obtained. Let X represent the number of times the die is rolled. 
a Find E(X). [1] 


b Show that P(X =4)= A, i] 
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c The die is rolled up to and including the roll on which the first 2 is obtained on 20 occasions. 
Find, by use of a suitable approximation, the probability that X = 4 on at least half of these 20 occasions. [4] 


d Fully justify the approximation used in part c. [1] 


14 A student wishes to approximate the distribution of X ~B(240, p) by a continuous random variable Y that 
has a normal distribution. 


a Find the values of p for which: 


i approximating X by Y can be justified [3] 
ii Var(Y) < 45. [3] 
b Find the range of values of p for which both the approximation is justified and Var(Y) < 45. [2] 
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PRACTICE EXAM-STYLE PAPE % 
N Sa a 


Time allowed is 1 hour and 15 minutes (50 marks) 


220 


1 


A mixed hockey team consists of five men and six women. The heights of individual men are denoted by 
h,, metres and the heights of individual women are denoted by h,, metres. It is given that Eh, =9.84, 
Lm =9.08 and Eh? =16.25. 


a Calculate the mean height of the 11 team members. [2] 
b Given that the variance of the heights of the 11 team members is 0.0416m/7, evaluate X42. [3] 


A and B are events such that P( AN 2’) = 0.196, P(A’ ^ B) = 0.286 and 
P[(A VU B)’] = 0.364, as shown in the Venn diagram opposite. 


a Find the value of x and state what it represents. [2] 
b Explain how you know that events A and B are not mutually exclusive. [1] 
c Show that events 4 and B are independent. [2] 
Meng buys a packet of nine different bracelets. She takes two for herself and then shares the remainder 
at random between her two best friends. 
a How many ways are there for Meng to select two bracelets? [1] 
b !f tne two friends receive at least one bracelet each, find the probability that one friend receives 
exactly one bracelet more than the other. [4] 
Every Friday evening Sunil either cooks a meal for Mina or buys her a take-away meal. The probability that 
he buys a take-away meal is 0.24. If Sunil cooks the meal, the probability that Mina enjoys it is 0.75, and if 
he buys her a take-away meal, the probability that she does not enjoy it is x. This information is shown in 
the following tree diagram. 
The probability that Sunil buys a take-away meal 0.75___— Enjoys the meal 
and Mina enjoys it is 0.156. Cooks a meal ÄN 
: =p t enjoy thi 1 
a Find the value of x. E E A [2] 
‘ : ‘ 7 A Enjoys tł 1 
b Given that Mina does not enjoy her Friday 0.24 Snack = njoys the mea 
meal, find the probability that Sunil cooked it. x Does not enjoy the meal [3] 
The following histogram summarises the total distance covered on each of 123 taxi journeys provided for 
customers of Jollicabs during the weekend. 
gS 
Z 
© 
Fa] 
3 
3 
2 
a 
9 10 11 12 13 14 
Distance (km) 
a Find the upper boundary of the range of distances covered on these journeys. [1] 
b Estimate the number of journeys that covered a total distance from 8 to 13 kilometres. [2] 
c Calculate an estimate of the mean distance covered on these 123 journeys. [3] 
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Practice exam-style paper 


6 Ineach of a series of independent trials, a success occurs wiih a constant probability of 0.9. 


a The probability that none of the first n trials results in a failure is less than 0.3. Find the least 
possible value of n. [2] 


b State the most likely trial in which the first success will occur. [1] 


c Use a suitable approximation to calculate an estimate of the probability that fewer than 70 successes 
occur in 80 trials. [4] 


7 The following stem-and-leaf diagram shows the number of shots taken by 10 players to complete 
a round of golf. 


6ll y Key: 6 | 1 
7|0 12x represents 
8/0199 61 shots 


a Given that the median number of shots is 74.5 and that the mean number of shots is 75.4, 
find the value of x and of y. [3] 


The numbers of golf shots are summarised in a box-and-whisker diagram, as shown. 


168 on 


— | æ m~~ 


<—— b cm — 


b Given that the whisker is 16.8 cm long, find the value of b, if the width of the box is bem. [3] 


c Explain why the mode would be the least appropriate measure of central tendency to use as the average 
value for this set of data. [1] 


8 To conduct an experiment, a student must fit three capacitors into a circuit. He has eight to choose 
from but, unknown to him, two are damaged. He fits three randomly selected capacitors into the circuit. 
The random variab!e X is the number of damaged capacitors in the circuit. 


a Draw up the probability distribution table for X. [3] 
b Calculate Var(X). [3] 


c The student discovers that exactly one of the capacitors in the circuit is damaged but he does not 
know which one. He removes one capacitor from the circuit and replaces it with one from the 
box, both selected at random. Find the probability that the circuit now has at least one damaged 
capacitor in it. [4] 
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THE STANDARD NORMAL Djs BUTION FUNCTION 
SVD e 


If Z is normally distributed with mean 0 and variance 1, the table gives the value of 


®(z) for each value of z, where a 
D(z) = P(Z < 2). 
Use ®(—z) = 1- ®(z) for negative values of z. 
0 z 

D vy 

® 
0.0 |0.5000 0.5040 0.5080 0.5120 (0.5160 0.5199 0.5239 0.5279 0.5319 0.5359, 4 8 12/16 20 24/28 32 36 
0.1 |0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753; 4 8 12/16 20 24 | 28 32 36 
0.2 |0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141; 4 8 12/15 19 23/27 31 35 
0.3 |0.6179 |0.6217 0.6255 0.6293 | 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517, 4 7 11/15 19 22 | 26 30 34 
0.4 |0.6554 | 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879; 4 7 11/14 18 22/25 29 32 
0.5 |0.6915 |0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224) 3, 7 10/14 17 20 | 24 27 31 
0.6 |0.7257 | 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549; 3, 7 10/13 16 19 | 23 26 29 
0.7 |0.7580 0.7611 0.7642 0.7673 |0.7704 0.7734 0.7764 0.7794 0.7823 0.7852} 3 6 9 |12 15 18) 21 24 27 
0.8 | 0.7881 |0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133) 3 5 8 | 11 14 16/19 22 25 
0.9 (0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 | 10 13 15 18 20 23 
1.0 |0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2 5 7/9 12 14/16 19 21 

222 1.1 |0.8643 0.8665 0.8686 0.8708 |0.8729 0.8749 0.8770 |0.8790 0.8810 0.8830, 2 4 6/8 10 12/14 16 18 

1.2 |0.8849 |0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015; 2 4 6,7 9 11/13 15 17 
1.3 10.9032 |0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177; 2 3 5 | 6 8 10/11 13 14 
1.4 (0.9192 |0.9207 0.9222 0.9236 |0.9251 0.9265 0.9279 0.9292 0.9306 0.9319} 1 3 4,6 7 8/10 11 13 
1.5 |0.9332 |0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441, 1 2 4,5 6 7/8 10 Il 
1.6 |0.9452 |0.9463 0.9474 0.9484 |0.9495 0.9505 0.9515 |0.9525 0.9535 0.9545 1 2 3 4 5 6/7 8 9 
1.7 |0.9554 |0.9564 0.9573 0.9582 | 0.9591 0.9599 0.9608 |0.9616 0.9625 0.9633, i 2 3 +4 4 5/6 7 8 
1.8 | 0.9641 |0.9649 0.9656 0.9664 | 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706) 1 1 2/3 4 4/5 6 6 
1.9 |0.9713 |0.9719 9.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767) 1 1 2/2 3 4/4 5 5 
2.0 0.9772 0.9778 0.9783 0.9788 |0.9793 0.9798 0.9803 0.9808 0.9812 0.9817) 0 1 12 2 3/3 4 4 
2.1 |0.9821|0.9826 0.9830 0.9834 |0.9838 0.9842 0.9846 0.9850 0.9854 0.9857} 0 1 1/2 2 2/3 3 4 
2.2 0.9861 0.9864 0.9868 0.9871 |0.9875 0.9878 0.9881 0.9884 0.9887 0.9890; 0 1 1 12 2)2 3 3 
2.3 |0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 0 1 1 1 1 2/2 2 2 
2.4 0.9918 0.9920 0.9922 0.9925 |0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 0 0 1 1 1 1 1 2 2 
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952, 0 0 0/1 1 1 1 1 1 
2.6 0.9953 0.9955 0.9956 0.9957 |0.9959 0.9960 0.9961 0.9962 0.9963 0.9964, 0 0 0/0 1 1 1 1 1 
2.7 (0.9965 0.9966 0.9967 0.9968 |0.9909 0.9970 0.9971 0.9972 0.9973 0.9974 0 0 0/0 0 1 1 1 1 
2.8 0.9974 0.9975 0.9976 0.997710.9977 0.9978 0.9979 0.9979 0.9980 0.9981 0 6 0/0 0 0/0 1 1 
2.9 0.9981 0.9982 0.9982 0.9983 |0.9984 0.9984 0.9985 0.9985 0.9986 0.9986' 9 0 0,0 0 0/0 0 0 


Critical values for the normai distribution 


The table gives the value of z such that P(Z < z) = p, where Z ~ N(0, 1). 


0.75 0.90 0.95 0.975 0.99 0.995 | 0.9975 | 0.999 | 0.9995 


0.074 | 1.282 | 1.645 | 1.960 | 2.326 | 2.576 | 2.807 | 3.090 | 3.291 
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Answers 


Answers 3 a 33 
b Boundaries at 1.2,1.3,1.6,1.8,1.9 m. 
1 Representa tion of data Densities œ 170,110, 210, 80. 
c 29 
ici I; A 
A Ki EE PI E E E 360850. 
a x ve Boundaries at 0, 5, 15, 30,u cm. 
2 Frequencies are equal because areas are equal. E 64 
3 a v7 b 6 Densities œ 12.8, 23.2, 16, T30 
c i 456 ii 246 
Exercise 1A 5 a 285-2.55=03 
1 0|12334456789  Key:1]0 b Boundaries at 2.55, 2.85, 3.05, 3.25, 3.75 min. 
1012333536 represents 10 Densities œ 59. 125. 100. 20 
2/06 Visits i pa? at 
2 a 15|02689 Key:15|0 c 2min 45s or 165s. 
16/0235 represents . aa 
17/025 150 coins M R i A 
6 a 324 
b- gleis b i 30 ii 92 
3 a 18 b 8 c 20% © Proof 
a ia i We d 440; Population and sample proportions are 
4 a 88 b $10.80 c Oand3 thesare: 
5 a BatsmanP | | BatsmanQ 7 a 480 b 130 c 110 
2/01 Key:6 |3 |1 17 
98776 |3|16 represents 36 8 a — b 399 c 12.6cm 
87411 |4|258 runs for P and 23 l 
99732 |5|1267 31 runs fi oat 7 
alae Ee 9 a 12:8:3 b n=73 
7/17 : 7 
b i Q;scored more runs. č i 20 ii 36 
ii P; scores are less spread out. d 0.215<k< gom 
6 a Wrens(10) |_| Dunnocks (10) We eii be certain only that 0.1<a<0.4 and that 
987 }1| 79 represents 18 eggs 10 a a=i59,b=636 b 23.5kg 
433210 |2 | 2234 for a wren and 19 
21578 eggs for a dunnock 11 4hd 
ely? 5n 
b 218 c 93% 12 33cm 
7 The giri who scored 92%; 5 boys. 13 p=29,q=94 
Exercise 1B Exercise 1C 
1 a 175 and 325years 1 a Points plotted at (1.5, 0), (3, 3), (4.5, 8), 
b All150years (6.5, 32), (8.5, 54), (11, 62), (13, 66). 
B i 25,175, 325, 475, 625 years. 
c SURANA at 25,175, 325, 475, 625 years pia i 7.85 
Densities « 15,18, 12, 6 (such as j 19.5 
0.1, 0.12, 0.08, 0.04). AERES 
d 15 b 
<14.5 | <19.5| < 29.5} < 39.5| < 44.5 
2 a 70 
Boundaries at 4, 12, 24, 28 grams. 3 16 41 65 70 
Densities œ 28, 33, 17.5. 
c 310 
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b 


b 
c 
d 


=> S a 


b 


c 


d 


b 


c 


Points plotted at (9.5, 0). (14.5, 3), (19.5, 16), 
(29.5, 41), (39.5, 65), (44.5, 70). 

i 34 or 35 ii = 33.25 to 44.5cm 
Points plotted at (0.10, 0), (0.35, 16), 

(0.60, 84), (0.85, 134), (1.20, 156) for A. 

Points plotted at (0.10, 0), (0.35, 8), (0.60, 52), 
(0.85, 120), (1.20, 156) for B. 

i =107 for engine A; = 87 for engine B. 

ii = 108 


= 42 

17; cfs 20 and 37 are precise. 

i 12 ii 28 
k=4.7to 4.8 

It has the highest frequency density. 
i 64 ii 76 

=74¢g 

(12, 304) 


= 32,b=45,c =15,d =33 


65 b 24 

Ratio of under 155cm to over 155cm is 3:1 for 
boys and 1:3 for girls. 

81 or 82 


There are equal numbers of boys and giris below 
and above this height. 

Polygon or curve through (140, 9), 

(155, 25), (175, 50). 


Points plotted at (18, 0), (20, 27), (22, 78), 

(25, 89), (29, 94), (36, 98). (45, 100). 

27 years and 4 ox 5 months 

i 1000 

ii All age groups are equally likely to find 
employment. 


Either with valid reasoning; e.g. underestimate 
because older graduates with work experience 
are more attractive to employers. 


Points plotted at (4.4, 0), (6.6, 5), (8.8, 12), 
(12.1, 64), (15.4, 76), (18.7, 80) for new cars. 
Points plotted at (4, 0), (6, 5), (8, 12), (11, 64), 
(14, 76), (17, 80) for = 100000 kin. 


Polygons 17 cars; curves = 16 cars. 


11 


a 


Points plotted at (1.0, 0), (1.5, 60), 

(2.0, 182), (2.5, 222), (3.0, 242) for diameters. 
Points plotted at (2.0, 0), (2.5, 8), (3.0, 40), 

(3.5, 110), (4.0, 216), (4.5, 242) for lengths. 

Least n = 0; greatest n = 28. 

Diameter and length for individual pegs are not 
shown. 

Best estimate is ‘between 171 and 198 inclusive’. 
The length and diameter of each peg should be 
recorded together, then the company can decide 
whether each is acceptable or not. 


Exercise 1D 


1 


Ww 


a 


b 


Any suitabie for qualitative data. 


a ae : : 
Pie chart, as — circle easily recognised, or a 


sectional percentage bar chart. 


Histogram; area of middle three columns > half 
total column area. 


a 


Numbers can be shown in compact form on 
three rows; bar chart requires 17 bars, all with 
frequencies 0 or 1. 


Sum = 100 shows that 11 boxes of 100 tiles could 
be offered for sale. 


7 months 

Percentage cf graph; passes below the point 
(12, 100). 

Histogram: Frequency density may be mistaken 
for frequency. 

Pie chart: does not show numbers of trees. 


Pictogram: short, medium, tall; two, three and 
four symbols, each for six trees, plus a key. 


Shows 12,18, 24 and a total of 54 trees. 


60-69 | 70-79 | 80-89 | 90-99 


B A 
26 6 


Any three valid, non-zero frequencies that sum 
to 40. 
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c Raw: stem-and-leaf diagram is appropriate. 


Tables 1 and 2 do not show raw marks, so these 
diagrams are not appropriate. 


Table 1: Any suitable for grouped discrete data; 
e.g. histogram. 


Table 2: Any suitable for qualitative data. 
a E.g. He worked for less than 34 hours in 


49 weeks, and for more than 34 hours in 3 weeks. 


b It may appear that Tom worked for 
more than 34 hours in a significant number 
of weeks. 


c Histogram: boundaries at 9, 34 and 44; densities 
œ 98 and 15. 


Pie chart: sector angles ~ 339.2° and 20.8°. 
Bar chart: frequencies 49 and 3. 
Sectional percentage bar chart: ~ 94.2 and 5.8%. 
a Some classes overlap (are not continuous). 
b Refer to focal lengths as, say, A to 
Eina key. 
Pie chart: sector angles 77.1°, 128.6°,77.1°, 
51.4°, 25.7°. 


Bar chart or vertical line graph: heights 
18, 30, 18, 12, 6. 


Pictogram: symbol for 1, 3 or 6 lenses. 


C | SL | Ma! G | Mo 
14.4) 8.9 | 3.8 | 17.7} 27.4 


Answers 


End-of-chapter review exercise 1 


1 


i 50 

t Boundaries at 20, 30, 40, 45, 50, 60, 70 g. 
Frequency densities œ 2, 3,10, 12, 5, 1. 

16.5, 3 and 18cm 

a 6 

b Quantitative and continuous 

a 6 


b Five additional rows for classes 
0—4, 20-24, 25-29, 30-34, 35-39. 


a=9,b=2 

a 48 

b 0.7cm 

a 120,180 and 90 
b 6.75cm 


c There isa class between them (not 
continuous). 


a 30 days for region A, 31 days for 
region B. 


b Bindu: unlikely to be true but we cannot tell, as 
the amount of sunshine on any particular day is 
not shown. 


Janet: true (max. for region A is 106h; min. for 
region B is 138h). 


People living in poverty 


[| in hundred thousands 
0 5 10 


Chile 

Sri Lanka 
Malaysia 
Georgia 


Mongolia 


[| as % of country population 
15 


20 25 30 


Mongolia, for example, has the lowest number, but the highest percentage, of people 


living in poverty. 
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i Points plotted at: 
(20.5, 10), (40.5, 42), (59.5, 104), 
(60.5, 154), (70.5, 182), (90.5, 200) or 
(20, 10), (40, 42), (50, 104), 
(60, 154), (79, 182), (90, 200) or 
(21,10), (41, 42), (51, 104), 
(61, 154), (71, 182), (91, 200). 

ii 174 to 180 

iii 58,59 or 60 


2 Measures of central tendency 
Prerequisite knowledge 


1 Mean = 5, median = 4.3, mode = 3.9. 

2 1.94 

Exercise 2A 

1 a Nomode b 16,19 and 21 

2 ‘The’ is the inode. 3 7 for x; —2 for y. 

4 14—20 for x; 3—6 for y. 

5 Mosi popular size(s) can be pre-cut to serve 
customers quickly, which may result in less wastage 
of materials. 

6 216 7 69 8 73 

Exercise 2B 

1 a 50 b 7.1 © 45 

2 a p=t7 b g=9or-l0 

3 a 23.25 b 1062 c 88 
d 12 e 113.67 

4 a 19 b 3.68825 

5 aal? 

6 a 4l b 24.925 

7 B8% 8 $1846 

9 30years; the given means may only be accurate to 
the nearest month. 

Actual age could be any from 28 yr 8.5 m to 
31 yr 3.5 m. 
10 a Mean($10) is not a good average; 36 of the 37 
employees earn less than this. 
b $7.25 
11 a $143282 


41% means from 1495 to 1531 passengers. 
29% means from 1052 to 1088 passengers. 


b k=252 


12 92cm 
13 a 54.6 b 59.0 
c The scales may have underestimated or 
overestimated masses. Not all tomatoes may 
have been sold (i.e. some damaged and not 
arrived at market). 
14 a 1.5 
b i 1.96 ii 3.48 
c For example, bar chart with four groups of four 
bars, or separate tables for boys and girls. 
15 n=12 
None of the 120 refrigerators have been removed 
from the warehouse. 
16 One more day required. 
He works at the same rate or remaining 
rooms take a similar amount of time (are of a 
similar size). 
17 a i 5.89cm ii 5.76 cm 
b 152.0° 
Exercise 2C 
1 a 74 b 94 c 64 
2 18 3 204 4 40.35mm 
5 -0.8 
6 a To show whether the cards fit (x < 0), or not 
(x>0). 
b 2% 
c —0.0535+24= 23.9465mm 
T Fidel; Fidel’s deviations > 0, 
Ramon’s deviations < 0. 
8 63.58; accurate to 1 decimal place. 
9 a 3n; 60° b 90n 
10 3.48 11 $1.19 
Exercise 2D 
1 5700; the total mass of the objects, in grams. 
2 a }5xor5}x 
b > 0.001x or 0.001 x 
3 ¥0.01w or 0.01¥ w 
4 3.6 
5 a Calculate estimate in mph, then multiply by : 


or 1.6. 
b 19.7625 x1.6=31.62km/h 
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(-1.8, 2.8) 
TE(5.2, 1.2) > (19, -2) 
ET(5.2, -1.2) > 9, 14) 
Location 1s dependent on order of 
transformations. 
p = 40; q = 12000; $75000 
Appears unfair; the smaller the amount invested, 
the higher the percentage profit. 


281% g/cm? 


b (26,-6) 


Exercise 2E 


1 


a 15 
b Median; itis greater than the mean (12.4). 


c For example, being unable to pay a bill because 
of low earnings. 


11.5 

Negatively skewed; ¢ = 10.9 < median. 
Median = 6; mode = 8 

Median is central to the values but occurs less 
frequently than all others. 


7 > 7D 


Mode is the most frequently occurring value but 
is also the highest value. 


c Two incorrect 

a =44min 

b 2.8 and 6.4min 

Points plotted at (0, 0), (0.2, 16), (0.3, 28), 

(0.5, 120), (0.7, 144), (0.8, 148). 

Median = 0.4kg 

a 92 b 32 

Mode (15) and median (16) unaffected; mean 

decreases from 16 to 14.75. 

a Points plotted at (85, 0), (105, 12), (125, 40), 
(145, 94), (165, 157), (195, 198), (225, 214). 
(265, 220). 

Polygon and curve give median ~150 days. 

b Likely to use whichever is the greatest. 


Estimate of mean (152.84) appears 
advantageous. (They could consider using the 
greatest possible mean, 164.41.) 


10 


11 


12 


13 


14 


15 


Answers 


‘Average’ could refer to the mean, the median or the 
mode. 

Median > 150; Mean < 150. 

150 is close to lower boundary of modal class. 
Claim can be neither supported nor refuted. 


There is no mode. 


Mean ($1000000) is distorted by the expensive 
home. 


Median ($239000) is the most useful. 

a p=12,q=40,r=54 

b i Reflection in a horizontal line through cf 
value of 30. 


ii Median safe current = median unsafe 
current 


a First-half median is in 1-2; second-half median 
isin4—5, 
b i 3 
ii First half data are positively skewed (least 
possible mean is 100.8 s). 
a Points plotted at (0, 0), (26, 15), (36, 35), 
(50, 60), (64, 75), (80, 80). 
Median = 39% 
New points at (0, 0), (16, 9.23), (26, 15), 
(40, 42.1), (54, 64.2), (80, 80). 


ii Mode = mean = median = 8. 
b No effect on mode or median. Mean increases 
to 9. Curve positively skewed. 


c b=~—11; No effect on mode or median. Curve 
negatively skewed. 


Any symmetrical curve with any number of modes 
(or uniform). 


a Symmetrical; mean = median = mode 
b Chemistry: negatively skewed; Physics: positively 
skewed. 
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End-of-chapter review exercise 2 


1 


10 


11 
12 
13 


14 
15 


a Mean < medianand inode. 

b Mean > medianand mode. 

c Mean = medianand mode. 

n = 22; 623g 

a Mode=13 

b Median = 28 

ON) 

a 15.15 

b 13.3 

a 6 

b i 14 
ii 25 

25 

a Proof 

b $0.18; it is an estimate of the mean 
amount paid. 

a Mode indicates the most common response. 


Median indicates a central response (one 
of the options or half-way between a pair). 


b Allows for a mean response, which indicates 
which option the average is closest to. 


a l 


b No; it is the smallest value and not at all 
central. 


c ll 

d Positively skewed; mode < median < mean. 

i Boundaries at 0.05, 0.55, 1.05, 2.05, 3.05, 4.55h. 
Frequency densities œ 22, 30, 18, 30, 14. 

ii 2.1h 

16.4 

81 

a Mode = 0, mean = 1, median = 0 

b Mean; others might suggest that none of the 
items are damaged. 

a 4006 — 2980 = $1026 

1.95 


b $3664 


3 Measures of variation 
Prerequisite knowledge 


1 
2 


16cm 


a 4.5 b 27.3 


Exercise 3A 


Box plots given by: smallest ...Q,...Q)...Q3... largest / 
Item (units), as appropriate. 


1 a 
c 
e 
2 a 
b 
3 a 
b 
c 
4 a 
b 
5 a 
b 
6 a 
b 
7 a 
b 
8 
9 


ao Ss SP GO ® 


b 35 and 20 
d 96 and 59 


25 and 17 

65 and 25 

8.5 and 5.6 

Range = 3.3; IQR = 1.75 

Negative 

41 and 18 

9 ...28...37...46...50/ Marks. 

Q; = 20, -Q 

Yes, if the range alone is considered. 

Hockey: ii...13...17...20...24/ Fouls. 

Football: 10 ....18.5... 20... 22.5... 23 / Fouls. 

with the same scale. 

Fewer fouls on average in hockey but the 

numbers varied more than in football. 

Ranges and IQRs are the same (35 and 18) but 

their marks are quite different. 

One of median (33/72) or mean (33/72) and 

one of range or IQR. 

Points plotted at (35, 0), (40, 20), (45, 85), 

(50, 195), (55, 222), (70, 235), (75, 240). 

35... 43.1... 46.6... 49.3 ...75/ Speed (km/h), 

parallel to speed axis. 

Positive skew 

i Males: 0...0...3...14...39/ Trips abroad. 
Females: 3... 5... 12 ... 20 ... 22 / Trips abroad. 
Same scale. 

ii Males: range = 39; IQR = 14; median = 3 
Females: range = 19; IQR = 15; median = 12 
On average, females made more trips 
abroad than males. Excluding the male 
who made 39 trips, variation for males and 
females is similar. 


No, there are no data on the number of different 
countries visited. 


=0.130Q b =0.345Q 
= 68th percentile d =0.095Q 
52cm? 


4.0 ... 25.8 ... 33.2 ... 38.8 ... 56.0 / Area (cm?). 
15.2 to 16.0cm? 
Area < 6.3cm? or area > 58.3cm?. 


Estimate ~ 8 (any from 0 to 15) 
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12 
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a Points plotted at (—1.5, 0), 1.0, 24), (0.5, 70), 
(0, 131), (0.5, 165), (1.0, 199), (1.5, 219), (2.5, 236). 
b 89.9° and 1.3° 
c 18% 
a 10 b 30 
a Points plotted at (0, 0), (4, 2), (11,21), 
(17, 44), (20, 47), (30, 50). 
b i =0.06g 
ii =~ 0.12g 
c n=40 
Variation is quite dramatic (from 0 up to a 
possible 3% of mass). 


Mushrooms are notoriously difficult to identify 
(samples may not all be of the same type). 
Toxicity varies by season. 
Should compare averages and variation (and 
skewness) and assess effectiveness in reducing 
pollution leve! for health benefits. 


Exercise 3B 


1 


© © N A 


10 


a Mean = 37.5, SD =12.4 
b Mean = 0.45, SD = 9.23 
a Var(B) = Var(C) = Var(P) = 96 
b The three values are identical. 


No; mean marks are not identica! 
(B = 33, C=53 and P= 63). 


Mean = 154 or 1.69; variance = 1.64 
a Mean=2; SD = 0.8923 
b Q =Q;=2,soIQR =0. 
That the middie 50% of the values are identical. 
a Girls: mean = 40, SD = 13.0 min 
Boys: mean = 40, SD = 16.3 min 


b i On average, the times spent were very 
similar. 


ii Times spent by boys are more varied than 
times spent by girls. 


5.94cm 

k = 6; Var(x) = 2.72 

a a=13,b=40 b 6.23cm 

k = 43; SD =12.5km; IQR = 24km; IQR = 2 x SD 
a Mean = 0.97t; SD = 0.44t 


b Mean decreases io 0.73t; SD increases 
to 0.57t. 


11 


12 


Answers 


x=12,v=18 
Gudrun is 22 years old. 
Variance increases from 69.12 to 72.88 years”. 
None of the original 50 staff have been replaced. 
a Mean decreases by 11.6cm. 

Median decreases by 40.3cm. 
b SD increases by 116cm. 

IQR increases by 216cm. 

(Range increases by 344cm.) 


c Discs get closer to P, but distances become more 
varied. 


d Proof 


Exercise 3C 


1 


A nh WwW NY 


10 


b 9.17 
d 161800 


a 65.375 

c 120 

e 28 

n=20;x =11 

2.15 

Mean = 60.2 kg; SD = 14. 1kg 


a Proof 


b 27.2psi 


52n? — 4915n + 616549 
n+29 


Exercise 3D 


ann fk WN e 


Men 8kg; women 6kg 

1.5 

7.92mm and 24009.8 

8 

n=15 

Mean is not valid (it is 165cm); standard deviation 
is valid. 

Mean = 4h20 min; SD = 7.3 min 


If 10-minute departure delay avoids busy traffic 
conditions. 


0.96cm? 
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9 a Mean=8;SD=4 b Mean=7;SD=4 
2- 

c Z 1 variance of the first n positive odd integers. 
10 a 43 b Proof c 1.179 
11 a Proof 

b DYy=1104,¥ y? =15416 e 20.1376 
12 a 162.14 cm. 

b (Lx? = 5720 640, > y? = 7 445 100); 

Var(X) = 42.1004 cm? 

Exercise 3E 
1 $0.64 2 85 3 264 


4 a 133 and 2673 
b 0.457 °C, using 133 and 2673. 
c 0.209 (°C)* 
5 = $75600 
6 a 27°F 
b Mean=12.5°C;SD=4.5°C 
7 a Fruit & veg; mean unchanged, so total 
unchanged. 


b Tinned food; mean increased but standard 
deviation unchanged. 
c Bakery; mean and standard deviation decreased 
by 10%. 
8 26m 9 18.0% increase 


End-of-chapter review exercise 3 


1 a Proof 
b $0.917 or $0.92 
2 a 0 
b 0,1,2,3 0r4 
Five 
4 a 0.319m 


b Mean increased by 1.5cm (to 90cm); SD 
unchanged. 


5 a Marks are improving and becoming more 
varied. 


b Third test 
c First test positive; second test negative 
a 97.92cm 
b 11.5cm 
a Range = 139; IQR = 8;SD = 37.7 
b IQR; unaffected by extreme value (180). 
8 i 173cm 
ii 834728.6 and 4.16cm 


10 


11 


12 


13 
14 


15 
16 


17 
18 


45.8 and 14.9s 


i Squad A | | Squad B 
7579 Key:1|9|4 
442|8|/2346 represents 91 kg 
98761/91456 for squad A and 
9740}]10/18 94 kg for squad B 
65|11|135 
2 |12 

ii 18kg 
iii 103.4kg 
i 126.5cm 


ii 4908.52cm? 

i Mean = 40.9 or 40$; SD =8.30 

ii 8.41 

5514 

SD increases by 68.4%. 

IQR increases by 9.30% (or 6.90%, depending on 
method). 

Proportional change in SD is much greater than 
in IQR. 

14.0cm 

a SD=21.5 

b Mean = -2;SD =21.5 


Mean is affected by addition of —202 
but SD is unaffected. 


a 5.6>2x2.75 b 803#140?;132444 c 1.63 
a 19.8-18=1.8 
b Ya? =1964.46, © b? = 2278.12; 1.99 years? 


Cross-topic review exercise 1 


1 


a 25 


b Player A = 25, player B= 21 


c€ 1/89 Key: 1 | 8 
2);0112234 represents 
2/5678 18 games 
31/3 


a 10-15 and 26-30 
26-15=11 


0 1-9 |10—14 |15-24 |25-30 |31—40 
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11 


12 


13 


a 
b 
c 


17.5 
126.3125 
Student A and student F 


Higher average and less varied growth. 


S om aTa 


a 


he 


ii 


iii 


iv 


0/89 Key: 1 |1 
3 88 represents 11 
9 unwanted emails 


... 14... 20 ... 27 ... 36 / Unwanted emails. 
5.94 and 6.685 

Mean = 2990000 or 2.99 x 10° 
SD = 366 151 or 3.66151 x 10° 
32 

70 and 75km/h 

72.3km/h 

Proof 

36.09g and 0.67 g 

0.4489 g? 

i 75 


12 

50 and 1.80 

C=S50-S 

SD(C) =SD(aS +b) 

a=-landb=50 

1.48 

79 

a=7,b=9 

Median = 0.825cim; IQR = 0.019cm 
q=4,r=2 

X: 0.802 ... 0.814... 0.825 ... 0.833 ... 0.848 / 
Length (cm) 

Y: 0.811... 0.824 ... 0.837 ... 0.852 ... 0.869 / 
Length (cm) 

Same scale 


Longer on average in Y; less varied in ¥. 


4 Probability 


Prerequisite knowledge 


1 


30 


1 


12 


Answers 


n(A U B’)=4 and n( 4^ B)=1 


Exercise 4A 
1 2 
1 — b = 
* 36 3 
2 a The team’s previous results. 
b 8 


c They may win some of the games that they are 
expected to draw. 


3 $2 
4 a 300 b Atleast 240 
5 a 5 b 15 
6 50 
7 3 
8 
1 
à 1953 
Exercise 4B 
2 2 5 
4 b < 2 
1 a 3 3 c 6 
2 a Giris who took the test. 
23 
b a 
40 
. 3 . 10 
3 a i 5 ii Tl 


b Nota female sheep. Not a male goat. 
4 ai (3,3) 
ii (2,4) and (4,2) 
iii (2, 2), (4, 4), (6, 6) 


b X,Y and Z are not mutually exclusive. 


5 a 1 b tl 
2 8 
6 a a=7,b=2,c=6 
3 13 
b i 5 ii 5 
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To a 7 a i 0.544 ii 0.3264 iii 0.4872 
b The result in any event has no effect on 
probabilities in other events. 
E.g. winning one event may increase an athlete’s 
confidence in others. 
5 3 
” 2 p2 
T 8 
b 9 a Untrue. Any number from 0 to 10 may be 
) delivered; 9 is the average. 
8 44% b 0.125 c 0.49 
10 6 7 
b a 2 p & gor 9 111 
10 — or 0.36 b — or 0.2775 
11 11 22 a 75 or 400 or 
10 a Students who study Pure Mathematics and 11 a 0.84 b 0.9744 
Statistics but not Mechanics. 
12 a 1 b 2 
b 89 ii 6 4 8 
c Mechanics, Statistics, Pure Mathematics 13 512 
11 a No; P(X AY)#+#0 or equivalent. 14 a Ol b 0.15 c 0.3 
12 a AandC b 0.22 a al os 25 
2 31 29 49 
e == = b k =8; — 
mag b 75 © 75 , 625 ; 
232 r r 
14 a 0.6 b 0.4 16 a i z ii a 
15 a b 0 al 
1 ll 108 
Exercise 4D 
1 0.63 
2 0.28 
3 a 0.32 b 0.48 
b 19; they had not visited Burundi. 4 a 
c They pee visited Angola or Burundi but not b ii 0.06 
Cameroon; 15. 
5 a 
a 2 
9 
xercise 4C 
1 1 
1 1 1 
2 ae, — = 
a 36 ea © 9 
3 a 0.012 b 0.782 
4 0.42 
5 a 0.84 b 085 
6 a 0.343 b 0.441 
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10 


11 


12 


6 19 13 

Noi 53 * 38” 25 

20 32 50 

s; — =x — 

80 80 80 

Epa P(B) => PAAD. 
i6 g 2 
1.9.3 

No; >+ x> 

S ie a 

A and B both occur when, for example, | aud 2 

are rolled; P(AM B) #0 


| 


PUY) = 4, PLY) = 5, PX andy) = - 


N 


No; X and Y both occur when, for example, 1 
and 5 are rolled; (X AY) #0 


1 27 1 
P(V) ==, (W) ==, PVAW) = — 
LF) z? W) 64° Vaw) 16 


1 1 A 


O76 8” 64 


Ownership is not independent of gender; 


60 108 110 
g for M and B: 22 y 28. 
€-8. 10r Me and 500 7 200 200 


Females 54.3%, males 55.0%. 


If ownership were independent of gender, these 
percentages would be equal. 


a =1860, b = 4092, c = 1488 


36 54 | 138 
x 


Southbound vehicles; —— = 
18 54, 69. 
207 207 207 


=Z X or 
207 207 207 


Exercise 4E 


i 


2 


a 


E 10 
7 13 


— Ajur wlth 
= 
= 
N 


S| 
a 
ll 


5 . 12 

16 23 
Those who expressed an interest in exactly 
two (or more than one) career, or any other 


appropriate description. 


Answers 


20 8 

5 a 29 b 39 

6 a S b 47 tas 
5 57 


7 a 10% of the staff are part-time females. 
b a=0.2,b=0.4,c=0.3 


. 4 . 3 we 4 
c is ii > iii — 
4 7 4 i 9 
10 a Proof 
b P(3)= 0.08, P(2) = 0.16, P(1) = 0.75 
c Z or 0.758 
32 
— 2 
d 107 or 0.299 
Exercise 4F 
3 5 
~ 38 b Ta 
2 2 
55 r ; 
3 a 72 b 33 
4 a 0.027 b 229 or 230 
20 
T 1 — 
5 a wo girls; 735 > T37 
b Equally likely; both = 
1 
6 = 
i 6 1 
7 a TANS b a 
8 a <= of 0.3525 b 20 500553 
9 a = or0.2 b 1 70.333 
c Z or0.875 
10 Z oron 
if 960.123 
ae . 13 
12 = 571 — 371 
a z Or 0.57 b ag 0r 0.37 
13 a y=0.44 b 5 0r0.35 
272 
14 = 0.36 0.812 or —— 
o! 335 
15 Woe 548 
16 0.48 
17 xe or 0.64 
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End-of-chapter review exercise 4 
1 53 


2 > or 0.727 


4 
or 47 
z185 so” 


1 
b 316 or 0.00316 


“4 
4 —— or 0.310 


248 
1 — or 0.675 
6 a 


14 
i - 0.00566 
bt a 
li 82 or 0.166 
495 


7 a 0.3x+0.7y =0.034 and y=2x 
x = 0.02, y = 0.04 


b Z or 0.696 
8 
8 as 
ag 
b t 
4 
c i 
3 
ə i — dobre 
105 


ii < or 0.0571 
35 


10 i 0.85, 0.15/0.8, 0.2/0.4, 0.6 on branches with 


labels T, B/J, X/J, X. 
ii 5 or 0.654 


ll i 37 or 0.435 
85 


ii X or 0.396 
48 


12 


13 
14 


15 
16 


17 


18 


19 


20 


iii Yes; P(high GDP and high 
birth rate)= 0 
287 


iv ——or 0.431 
iv G66 or 


4 

15 

9 

19 

0.198 

a Aand B both occur or it shows that P(AM B) 40. 


b Only two of the 36 outcomes, (1, 3) and (3, 1), are 
favourable to A and to B. 


a 


b 


No; = 1x to show P(AM B) + P(4) x P(B). 
Fa 


18 6 
0.26 
x = 54; 312 adults 
aw 
34 
w 2 
ii > 
4 
13 
b = 
30 
17 
72 
8 
35 
a i L 
8 
æ l 
ii — 
4 
b 22 


5 Permutations and combinations 
Prerequisite knowledge 


2 2 
P(A| B)==, P(B| A)== 
(A|B) 5 (B| A) a 
Exercise 5A 
1 a 20 b 6 c 294 
d 162 e 224 
2 a 10 b 9 c 4 
3 all b 15 c 22 
1x21 1x3! 15!x 4! 
4 Fete jna tjr 
es 7i 5) 72 16! 
: 2 
5 zg ™ 
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Answers 


e ees 8 al b 0 
22! 5! 
Es.§ 9! c 8 d 20 | 
515! — 41 + 2! + 2!) 9 x«>y+lorx=y+t2 or equivalent 
Exercise 5B Exercise 5E 
1 720 1 a 2520 b 3024 
2 a 807x10” 2 665280 
b 24 3 6840 
c 6227020800 4 a 182 b 196 
3 a2 b 720 c 40320 5 a 60 b 240 
4 a 24 b 6 c 5040 6 a 272 b 132 c 140 
5 39916800 7 a 60480 b 1680 
6 362880 8 360 
7 n=19 9 a l2 b 48 


10 120 ways for (r =)3 passengers to sit in (n =) 6 
empty seats on a train, or use of î P;, P}, or PR. 


Exercise 5C 1 n! 
1 a 120 b 360 c 45360 I` a r>zn” DAE ae 
d 34650 e 415800 12 132600 
2 A% a 13 18144 
<6 ae 14 a 6652800 
D a ws b 3024000 
c 6435 d 99768240 © 4959360 
4 First student is correct. Second student has treated 
them as two identical trees and three identical ‘ 
hone Exercise 5F 
5 a 1024 1 a 56 b 126 
b i 252 i 386 2 a 1960 b 980 c 121 
: 3 a 2598960 b 845000 
6 One letter appears three times; another appears 
twice, and two other letters appear once each 4 ai 230230 ii 230230 
(e.g. pontoon, feeless, seekers, orderer). b x=y+z 
7 a10 b 50 c 1050 5 16 
161 
Exercise 5D 7 a 120 b 34 
1 æa 120 c 12 d 66 
b i 48 ii 72 üi 18 8 45 
2 a 48 b 192 c 480 9 They can share the taxis in 56 ways, no matter 
d 144 e 0 which is occupied first. 
3 2:1 10 a 184756 b Two 
4 a 80640 b 241920 c 63504 d 88200 
5 a 3600 b 720 c 240 11 330 
6 a 20 h 40 12 1058400 
7 a6 b 180 c 36 13 a 252 b 56 c 175 
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14 = 27907200 


15 72 
16 a 18 b 132 
Exercise 5G 
1 2 8 
1 a 3 b i c 15 
2 a = or 0.457 
27 
b gz 01 0.293 
Cc 3 
4 
3 a 0.0260 b 0.197 
4 0.0773 
5 0.501 
6 0.588 
5 1 
7 — b > 
a T6 2 
2 1 5 
8 a 3 b 1D c T2 
9 a 0.331 b 0.937 
10 a 50400 
: 1 x 1 
b 1 120 il 60 
1 2 
11 a sa T OOL b a 
12 0.290 
13 a a=166,b = 274, c = 488 
b 0.162 
28 
14 — or 0.683 
41 
15 Six tags and three labels. 
16 a Spee and = form =2,3,4 and 5, 


2 


wu 


n-l 


End-of-chapter review exercise 5 


1 a 30240 
b 240 
2 a 32659200 
b 8467200 
3 2 
28 


10 
11 


12 


13 
14 


15 


16 


17 


18 
19 


20 


a 1000000 

b i 09i 
ü 0.0001 

17 280 

i 1663 200 

ii 30 240 

iii 1 622 880 

iv 10 

a 756 756 

b 72072 

a 330 


87 

a 10’ or1x10° 

b 9° x 10° or 7.29 x 108 
c 9x108 

5 53 x106 or 1.25 x 108 


14 

44 286 

a 453 600 

b 86 400 

1 

10 

a ll values; 35 
11 


b = 
12 


ii 18 


Cross-topic review exercise 2 


1 


a 96 
b -71 
c 93 oF 9.125 
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Answers 


2 a 62 c 39915800 
b Odd; S> d 59512320 
3 30856 16 a 229975200 
a i 48 b 0.75 
ii 24 17 a i sor 0.471 
b 120 
% Td 
5 a 27 ii ao 
1 wiz. 32 
= — or 0.209 
b i 3 iii is or 
. 4 b The events ‘being on the same side’ and 
u 9 ‘being in the same row’ are not independent. 
1440 
7 a 604800 6 Probability distributions 
a Prerequisite knowledge 
Kil 
8 a 134596 q ii & 
, 1 D 13 1 P(D)=0.11 
b i — ii — 8 1 
24 24 2 With replacement: P (both wda. A a ri or 0.25 
c i = more likely. a > 1 
19 Without replacement: P (both red) = =x ==— 
10 : e 5 5 
ii D ; less likely. or 0.20 
10 a 3 > 3 
b 15 
0.4 0.2 
ll a 8l 
b 15 ogee 
12 ai lai or 0.483 13713 
3 a 50k*-25k+3=0;k =0.2,k =0.3 
b k=0.3 gives P(W =12)=-0.1. 
c 0.!4 
13 a 4 1 2 
28 49 
81 81 
5 
or appropriate Venn diagram. 1 2 3 
30 40 75 0.446 0.275 0.0527 
b Yes; e.g. = to sh 
eRT 00 WG 
P(Sand P) = P(S) x P(P). : 
14 a 3! or531441 
b 3!° or 59.049 l Z 3 
c 3'or2187 45 20 2 
15 a 3628800 a a al 
b 7257600 
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10 


11 


12 


13 


14 


15 


16 


17 


Number of red grapes selected (R); R € {0,1} 
Number of green grapes selected (G); G € {4, 5} 
R+G=5 


0 1 2 

0.1 0.6 0.3 

0 1 2 
D 0.4096 0.4608 0.1296 


Hair colour and handedness are independent. 
a Proof 


b 415 16 7 18 |10 
3 NZ] 3/2] 2] 1 
16] 16| 16] 16] 16| 16 
a 
b 2 4 
6 6 1 
14 14 14 | 


a 0.374 

b N=0 is more likely than N =4; 
P(N’)>P(N) each time a book is selected. 

a Proof 


P(X is prime) = 5 


a P(heads)= 0.2 


b The number of tails obtained, but many others 


are possible, such as 2H and 0.5/7 


P(T > H)= 0.896 


a 1 | 2 3 
D s 10 
36 36 36 
1 
b pas 
3 


Exercise 6B 


NA nn bh U 


10 


11 


12 


13 
14 


E(X) =2.1; Var(X) = 0.93 
a p=0.2 
b E(Y)=1.84; SD(Y) = 0.946 
E(T)=5, Var(T)=11.5 
m = 16; Var(V ) = 31.3956 
Var(R) = 831 
a=11; Var(W)=79.8 
a E(grade) = 3.54; SD(grade) = 1.20; A smallish 
profit. 
SD = 1.20; variability of the profit. 
b E(grade) = 2.46, SD(grade) = 1.20 


Both are unchanged. 


b E(X)=85; PIX >EWY))= 5 
c Var(X)= 492 or 49.9 


0 1 2 3 


0.543 0.441 0.189 0.027 


b 900 times 

a E(G)=9.8; E(B) =1.2 

b 2:3; fis the same as the ratio for the number of 
girls to boys in the class. 


c Var(G) = 0.463 or ae 


725 
a Proof 
b E(R)=1.125 
c E(G)=1.5 
a $340 
b If the successful repayment rate is below 70%. 
a Proof b n=35 
a 1,2,3,5. 
i 2 5 
4 1 4 3 
12 12 12 12 
b P(S> 24) = $ 


-917 
e Var(S)= 278 
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Answers 


15 a Proof 10 i 
b 0 1 2 3 4 ii { 2.) a. 6 
X 1 12 | 54 | 108 |81 30 | 13 | 3 
Fey 256 | 256 | 256 | 256 | 256 70 | 70 | 70 
eee 6 
at = 5 the probability of not obtaining B m A aik 2g 
with each spin. Mra ue 
11 («=3):P(v>4)=2 
Eind-of-chapter review exercise 6 
af -735 12 
1 (e=2] 20-25 or 2.36 
13 
—145 
Var(X) = ls or 1.23 
2 a q=130orq=48 
b 34 14 
3 a $6675 
b 4.27 ; 
e 1 1 I 
a 8 6 3 2 
11 i 
5 0.909 iig 
6 2a i : 15 i Proof 
i 3 ii 20 [17415 [134 
5 6|7 | 8|o 
b 0 1 2 45/45 |45| 45 
0.3 | 06 [p iii 134 or 13.3 
rA iv = or 0.444 
b O} 1] 2] 3] 4 8 | 9 {11 |14 |15 19 |24 i $ 
A ESE T 7 The binomial and geometric 
36|36|36 36} 36| 36| 36| 36| 36| 36| 36| 36| 36) 36 distributions 
E(S) = 53) or 5.86 Prerequisite knowledge 
8 a 0, 1,2, 4,5. 1 105 
1 9 2 27 
7 2 —+—+— += 
b OR RES 64 64 64 64 
TERIER 
Ne a Exercise 7A 
é 1 a 0.0016 b 0.4096 
c 0.0256 d 0.0272 
d 2 a 0.0280 b 0.261 
9 a b=lorb=6 c 0.710 d 0.552 
» 2B 3 a 0.0904 b 0.910 
30 c 0.163 d 0.969 


Copyright Material - Review Only - Not for Redistribution 


240 


ww 


N 
Cambridge International AS & A Levelgfathematics: Probability & Statistics 1 


4 a 0.121 b 0.000933 c 0.588 

d 0.403 e 0.499 

5 a 0.246 òb 0.296 

6 0.0146 

7 0.254 

8 a 0.140 b 0.000684 

9 0.177 

10 a 0.599 b 0.257 

1! 0.349 

12 a 0.291 b 0.648 

13 a 0.330 b 0.878 

14 a 0.15625 or $ b 0.578 

15 9 16 6 

17 16 18 23 

19 a 0.0098 b a=208,b=3 c 68 

20 a p=0.5;the probability of more than 5m of 
rainfall in any given month of the monsoon 
season. 

b The probability of more than 5m of rainfall 
in any given month in the monsoon season is 
unlikely to be constant or Whether one monih 
has more than 5m of rainfall is unlikely to be 
independent of whether another has. 

21 a 0.6561 b 0.227 
22 0.244 
Exercise 7B 
1 a 1,0.8and 0.894 b 13.2, 5.94 and 2.44 
c 65.7, 53.874 and 7.34 d 14.1,4.14 and 2.04 
2 a 2and1.5 b 0.311 c 0.367 
3 a 0.752 b 0.519 
4 a n=50,p=0.4 b 0.109 
5 a n=42, p=- b 0.0462 
6 n=3, p=0.9 
De 0 1 2 3 
0.001 0.027 0.243 | 0.729 
7 a E.g. X isnot a discrete variable or there are more 
than two possible outcomes. 

b E.g. Selections are not independent. 

c E.g. X can only take the value 0 or X is not a 
variable. 

8 n=18;0.364 
9 = p=0.75, k=5157 


10 a 6.006 
5.93 and 5.93 
c Proof 
da 0.197 
1! a 46 
b 3.68 
c i 0.566 
ii 0.320 
Exercise 7C 
1 a 0.0524 b 0.91808 c 0.4096 
2 a 0.148 b 0.901 c 0.0672 
3 a 0.125 b 0.875 
4 a 0.0465 b 0.482 
5 a 0.24 b 0.922 c 0.0280 
6 a i 0.032 ii 0.0016 b 0.2016 
7 a i 0.0315 ii 0.484 iii 0.440 
b Faults occur independently and at random. 
$ a 0.21 b 0.21 e 0.21 
9 a 0.364 b 0.547 
10 0.0433 
11 a Notsuitable; trials not identical (p not constant). 
b Not suitable; success dependent on previous two 
letters typed or X cannot be equal to 1 or 2 or p 
is not constant. 
c Itis suitable. 
d Not suitable; trials not identical (p not constant). 
12 0.096 
13 0.176 
335 
14 0.9770r 343 
i5 a 0.0965 or La b 0.543 
16 0.103 sci 
Exercise 7D 
1 y. 
9 
2 5 
3 i 
81 
4 Mode=1, mean =2 
5 6and 0.335 
6 a 16 b 0.00366 


Copyright Material - Review Only - Not for Redistribution 


Answers 


7 a Thierry 10 k=21 
45 3 n-2 
b — or 0.0574 N p= 
784 °° 1 k-(3) TRT, 
8 a With replacement, so that selections are 12 a 12 
Po i 0.263 
i -— or 0.105 ii 0. 
b i 256 or ii 0.866 
I iii 0.0199 
ii 16 or 0.0625 13 i 0.735 
9  E(X)=500;b=1001 ii n=144;k=6 
10 a Any representation of the following sequence. 14 36:30:25 
T di, H s ge ° ° 
a 8 The normai distribution 
b 0.52 +0.54 +0.56 +0.5° +... Prerequisite knowledge 
2 1 23.4 and 11.232 
3 2 #32, p=0.35 
End-of-chapter review exercise 7 Exercise 8A 
asi 1 a False b True c False 
A ( n } d False e True f False 
2 a 0.147 2 a i Op >09 
b 0.00678 ii Median for P< median for Q. 
3 į; Íl iii IQR for P> IQR for Q. 
k 81 b i Same as range of P. 
ii 0.0791 or Bs . 
1024 ii No; High values of W are more 
iii 0.09375 or 2 likeiy than low values or negatively 
4 32 skewed. 
4 as es 
9 iii 
D 
b 0.394 Z i 3 
5 a 0.59049 z 
b 0.40951 5 P 
xo} 
c 0.242 £ 
6 a 10 5 Mp Hw Po 
b 9.00772 or 648 š 
c 0.00162 
B 
T° a 2 a 
b 137x276 g & men 
8 a i 0.0706 3 
ii 0.0494 £ 
iii 0.1176 
b The students wear earphones 7 10 
independently and at random. eignt (em) 
9 i 0.993 
ii n=22 
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4 a / 3 a k=1.333 b k=0.111 
E Applejuice © k=0.600 d  k=1.884 
> e k=-0.674 f k=-0.371 
3 g k=-1473 h k=—0.380 
O 
È i k=1.71 į k=1.035 

4 a c=0.473 b c=0.003 
PE aA c c=210 d c=1.245 
e c=-0.500 f c=-2.14 
b Peach juice curve wider and shorter than apple g c=3.09 h c=1.96 
juice curve; equal areas; both symmetrical; both , : 
centred on 340 ml. = ees j: eee 

5 a Exercise 8C 
E 1 a 072% b 0.191 © 0.629 
> 2 a 0.919 and 0.0808 b 0.613 and 0.387 
E c 0.964 and 0.0359 d 0.0467 and 0.953 
£ e 0.285and 0.715 f 0.954 

g 0.423 h 0.319 
3.3 3.4 
Mass (kg) i 0.231 į 0.0994 
b USA curve wider, shorter and centred to 3 a a=35.0 b b=15.5 
the right of UK curve; equal areas; both c c=18.5 d d=23.6 
242 symmetrical. ë e=868 
6 a Proof 4 a f=114 b g=427 
b oy =1.11> oy = 0.663 « kos d j=175 
5 0.0513 
T 6 0.933 
3 
É 7 0 =2.68 
F 8 u=126 
£ 9 U=588,0 =14.7 
10 u=93.8,0 = 63.8 
, 11 u=5.00, 0 = 6.40; 0.0620 
Exercise 8B 12 u =7.08, 0 = 1.95; 0.933 
La, eee 13 p=5.78,0 =2.13; 0.372 
e 0.937 d 0.531 a eel 
e 0.207 f 0.0224 
g 0.0401 h 0.495 
i 0.975 į 0.005 Exercise 8D 
2 a 0.0606 b 0.380 1 0.662 
c 0.0400 d 0.0975 2 a 0.191 b 74 
e 0.190 f£ 0.211 3 a Small = 28.60%; medium = 49.95%; 
g 0.770 h 0.948 large = 21.45% 
i 0719 į 0.066 b k=58.0 or 58.1 


Copyright Material - Review Only - Not for Redistribution 


4  w=7.57 
5 a 0.567 b 0.874 
6 a b=240 b 82.0m 
7 9.099 days 
8 5000 
9 oo =3.33 
10 p= 91.2; 28.8% 
11 c=3.88 
12 a o=1.83 b 23 
13 a u=25.0 b n=1000 
14 a 0.683 b 0.0456 
c o=1.64, u= 6.39 
15 a 0.950 b n=14 
16 a 0.659 b 0.189 
17 a 0.284 b 0.0228 
Exercise 8E 
1 a Yes; u=12,07 =4.8 
b No; ng=1.5<5 
e Yes; u=5.2,07 = 4.524 
d No;np=3<5 
2 a n=209 b n=34 
c n=l11 d n=17 
3  B(56,0.25) 
4 0.837 
5 0.844 
6 a p=0.625; Var(/7) = 37.5 
b 0.0432 
7 a Proof 
b 0.292;np=10>5 and nq =30>5 
8 a 44 b 4.45 
9 a i 0.187 ii 0.0118 
b E(X)=1600; Var(X) = 320 
c 0.874 
ið a i 0.105 ii 0.135 
b 0.145 
11 0.0958 
12 a 0.0729 b 0.877 
13 a 0.239 b 0.0787 
14 a 0.789 b 0.920 
15 0.100 
16 0.748 
17 0.660 


c 0.136 


c 0.257 
c 0.118 


c 0.156 


Answers 


End-of-chapter review exercise 8 


1 
2 
3 


13 


14 


15 


16 


0.841 

0.824 

i 0.590 

ii np=24>5andnq=6>5 
0.287 

0.239 


Probability density 


i 0.035 

ii 0.471 

iii k=103 

a i 315o0r316 
ii 7350 
iii 0.840 

b 0.933 

o =2.35 


U = 3.285; 61.3% 
5.69% 


i 0.238 
ii k=1l16 
iii 0.0910 
0.408 
0.483 
a u=17.5,0? = 58.0 


b i 17.0min 
ii 38.2 > 38 days 
a o=7.24 
b k=15.1 
0.936 


Cross-topic review exercise 3 


1 a i Proof 
3 63 
ii E(X)=— X)=— ; 
ii ome a? Val ) coo 


iii 400 or 0.9975 
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10 


11 


12 


13 


14 


b 


13 or 0.325 


19 40 
27 

u=25.8,0 =7.27 
0.0228 


a 


oT Sp F 


eae o Teo Teer TFT 


t<) 


o =2.99 

26.1% or 26.2% 
L= 34.0 

11.9% 

o =2.70 

0.276 

0.822 


285 285 


0.253 

np>5 and ng>5 

Proof 

0.432 

E(X)=4 

Proof 

0.315 

np = 8.4375 > 5 and nq = 11.5625 >5 


Del 
48? ~ 48 
1 3 
ii O<p<—or—<p<l 
il p a 7 p 


47 
48 


apel ie = < 
L 4 4° 2 


48 


Practice exam-style paper 


1 a 
b 
2 a 


N4 
om Bp Oo SS 


1.72m 
16.75 


x = 0.154; the value of P( Aand B) 
or P(AN B) 


P(A B) #0 or equivalent. 
Proof 

36 

5 


9 
0.35 


95 
— or 0.693 
137 9 9 


11km 

61 

10.8km 

12 

First trial 

0.176 

x=7,y=4 

b=6.6 

It is neither central nor representative 
or 8 of the 10 values are less than 89. 


0 1 2 
io | 15 3 
( 28 28 28 
A ee 0.402 
112 
11 
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Glossary 


D 


The following abbreviation and symbols are used in this 
book. 


v4 
D 
No. Number of 
5 is approximately equal to 
# is not equal to 
oc is proportional to 
therefore 


is identical to 


Arrangements: see permutations 


Average: any of the measures of central tendency, 
including the mean, median and mode 


B 

Binomial distribution: a discrete probability distribution 
of the possible number of successful outcomes in a finite 
number of independent trials, where the probability of 
success in each trial is the same 


C 
Categorical data: see qualitative data 


Class: a set of values between a lower boundary and an 
upper boundary 

Class boundaries: the two valnes (lower and upper) 
between which all the values in a class of data lie 

Class interval: the range of values from the lower 
boundary to the upper boundary of a class 

Class mid-value (or midpoint): the value exactly half-way 
between the lower boundary and the upper boundary of 
a class 

Class width: the difference between the upper boundary 
and the lower boundary of a class 

Coded: adjusted throughout by the same amount and/or 
by the same factor 

Combinations: the different selections that can be made 
from a set of objects 

Complement: a number or quantity of something required 
to make a complete set 

Continuity correction: an adjustment made when a discrete 
distribution is approximated by a continuous distribution 
Continuous data: data that can take any value, possibly 
within a limited range 


Cumulative frequency: the total frequency of all values less 
than a particular value 


Cumulative frequency graph: a graphical representation 
of the number of readings below a given value made 
by plotting cumulative frequencies against upper class 
boundaries for all intervals 


D 


Dependent (events): events that cannot occur without being 
affected by the occurrence of each other 


Discrete data: data that can take only certain values 


E 

Elementary event: an outcome of an experiment 
Equiprobable: events or outcomes that are equally likely to 
occur 

Expectation: the expected number of times an event 
occurs 

Extreme value: an observation that lies an abnormal 
distance from other values in a set of data 


F 


Factorial: the product of all positive integers less than or 
equal to any chosen positive integer 


Fair: not favouring any particular outcome, object or 
person 


Favourable: leading to the occurrence of a required event 
Frequency: the number of times a particular value occurs 
Frequency density: frequency per standard interval 


G 

Geometric distribution: a discrete probability distribution 
of the possible number of trials required to obtain the first 
successful outcome in an infinite number of independent 
trials, where the probability of success in each trial is 

the same 

Grouped frequency table: a frequency table in which values 
are grouped into classes 


H 


Histogram: a diagram consisting of touching columns 
whose areas are proportional to frequencies 


I 


Independent (events): events that can occur without being 
affected by the occurrence of each other 
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Interquartile range: the range of the middle half of the 
values in a set of data; the numerical difference between 
the upper quartile and the lower quartile 


K 


Key: a note that explains the meaning of each value in a 
diagram 


L 


Lower and upper boundary: the smallest and largest values 
that can exist in a class of continuous data 


M 
Mathematical model: a description of a system using 
mathematical concepts and language 


Mean: the sum of a set of values divided by the number of 
values 


Median: the number in the midd!e of an ordered set of 
values 


Modal class: the class of values with the highest frequency 
density 


Mode: the value that occurs most frequently 


Mutually exclusive (events): events that cannot occur at 
the same time because they have no common favourable 
outcomes 


N 
Normal curve: a symmetrical, bell-shaped curve 


Normal distribution: a function that represents the 
probability distribution of particular continuous random 
variables as a symmetrical bell-shaped graph 


O 


Ordered data: data arranged from smallest to largest 
(ascending) or largest to smallest (descending) 


Outliers: extreme values: observations that lie an abnormal 
distance from other values in a set of data 


P 


Parameters: the fixed values that define the distribution of 
a variable 


PDF: see probability density function 

Permutations: the different orders in which objects can be 
selected and placed 

Probabilities: measurements on a scale of 0 to 1 of the 
likelihood that an event occurs 


Probability density function (PDF): a graph illustrating the 
probabilities for values of a continuous random variable 


Probability distribution: a display of all the possible values 
of a variabie and their corresponding probabilities 


Q 


Qualitative data: data that take non-numerical values 
Quantitative data: data that take numerical values 


Quartile: any of three measures that divide a set of data 
into four equal parts 


R 
Random: occurring by chance and without bias 


Range: the numerical difference between the largest and 
smallest values in a set of data 


Raw data: numerica! ‘acts and other pieces of information 
in their original form 

Relative frequeacy: the proportion of trials in which a 
particular event occurs 


S 


Selection: an item or number of items that are chosen 
Skewed: unsymmetrical 


Standard deviation: a measure of spread based on how far 
the data values are from the mean; the square root of the 
variance 


Standard normal variable: the normally distributed 
variable, Z, with mean 0 and variance 1 


Stem-and-leaf diagram: a type of table for displaying 
ordered discrete data in rows with intervals of equal 
widths 


Summarise: to give an accurate general description 


T 


Trial: one of a number of repeated experiments 


U 


Unbiased: not favouring any particular outcome, object or 
person 


Vv 


Variance: the mean squared deviation from the mean; the 
square of the standard deviation 

Variation: dispersion; a measure of how widely spread out 
a set of data values is 
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@(z) 195-9 
table of values 222 


addition law, mutually exclusive 
events 94-5 

averages see measures of central 
tendency 


bar charts 13 
binomial distribution 166 
expectation 173-4 
normal approximation 208-12 
variance 173—4 
binomial expansions 167-8 
box-and-whisker diagrams 
(box plots) 60 


categorical (qualitative) data 2 
data representation 20 
central tendency, measures of see 
measures of central tendency 
class boundaries 6, 7 
class frequency 7, 8 
class widths 6, 7 
classes 
histograms 6 
stem-and-leaf diagrams 3, 4 
coded data 37-8, 40-1 
standardising a normal 
distribution 200-3 
variance and standard 
deviation 75—80 
combinations 123, 135 
"Cr notation 135-6 
problem solving 138—40 
combined datasets 
mean 31-2, 73 
variance and standard 
deviation 72-3 
complement of an event 92 
conditional probability 108-9 
and dependent events 112-15 
and independence 111-12 
continuity corrections 209-10 
continuous data 3 
cumulative frequency graphs 13-15 
histograms 6-9 
continuous random variables 188-9 
normal distribution 190-206 
probability density functions 189-90 


cumulative frequency 13 
cumulative frequency graphs 13-15 
estimation of the median 43—4 

interquartile range 58-9 


data representation 
box-and-whisker diagrams 60 
comparing different methods 20 
cumulative frequency 
graphs 13-15 
histograms 6—9 
stem-and-leaf diagrams 3—4 
data types 2-3 
de Moivre, Abraham 208 
dependent events 112-15 
deviation 65 
see also standard deviation 
discrete data 2-3 
stem-and-leaf diagrams 3—4 
discrete random variables 150 
binomial distribution 166-74 
expectation 156-7, 158 
geometric distribution 166, 175—82 
probability distributions 150-2 
variance 157-8 


elementary events (outcomes) 91 
equiprobable events 91 
errors 188 
events 91 
dependent 112-15 
exhaustive 92 
independent 100-2, 111-12 
mutually exclusive 94-5 
expectation 92-3 
of the binomial distribution 173-4 
of a discrete random variable 
156-7, 158 
of the geometric distribution 180-1 
see also mean 


factorial function 124—5 
fair (unbiased) selection 91 
Fermat's Last Theorem 156 
frequency density 7-9 


Gauss, Carl Friedrich 188, 205, 208 
geometric distribution 166, 175-8 
expectation 180-1 
mode 180 


grouped data 
mean 30-1 
variance and standard deviation 
66, 67-8 
grouped frequency tables 6 
estimation of the mean 32-3 


height variation 64 
histograms 6-9 

modal class 28 

use in image processing 13 


independent events 100-2 
application of the multiplication 
law 105—6 
and conditional probability 111-12 
interquartile range 56 
box-and-whisker diagrams 60 
comparison with standard 
deviation 69 
grouped data 58-9 
ungrouped data 56-7 


mathematical models 166 
using the normal distribution 205—6 
mean 27, 30-1, 44 
of the binomial distribution 173-4 
of coded data 37-8, 40-1 
of combined datasets 31-2, 73 
of a discrete random variable 
156-7, 158 
of the geometric distribution 180-1 
from grouped frequency 
tables 32-3 
of a normal distribution 190-1, 193 
measures of central tendency 27 
choosing an appropriate 
average 44—6 
effect of extreme values 45 
historical background 44 
mean 30—41 
median 42—4 
mode and modal class 28-9 
for skewed data 45—6 
see also mean; median; mode 
measures of variation 55 
coded data 75—80 
interquartile range and 
percentiles 56-9 
range 55-6 
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measures of variation (Cont.) 
variance and standard deviation 
65-9, 72-3 
see also standard deviation; variance 
median 27, 42-3, 44, 56 
estimation from a cumulative 
frequency graph 43-4 
modal class 28-9 
mode 27, 44 
of the geometric distribution 180 
multiplication law for independent 
events 100-2 
application of 105—6 
multiplication law of probability 112-15 
mutually exclusive events 94—5 


normal curve 190-1 

normal distribution 188, 193 
approximation to the binomial 

distribution 208-12 

modelling with 205-5 
properties of 194 
standard normal variable (Z) 195-9 
standardising 200-3 
tables of values 222 


parameters 
of a binomial distribution 167 
of a geometric distribution 176 
of a normal distribution 193 
Pascal's triangle 168, 172 
percentiles 58-9 
permutations 123, 134 
of n distinct objects 125—6 
of n distinct objects with 
restrictions 129-31 
of n objects with 
repetitions 127-8 
"Pn notation 125 
"Pr notation 132 
problem solving 138—40 
of r objects from n objects 132-3 


X 


possibility diagrams (outcome 
spaces) 101 
possibility space 95 
probability 91 
addition law 94-5 
conditional 108-9, 111-15 
dependent events 112-15 
experiments, events and 
outcomes 9%i-3 
independent events 100-2, 111-12 
multiplication law for independent 
events 100-2, 105-6 
mulüplication law of 
probability 112-15 
mutually exclusive events 94-5 
Venn diagrams 95-7 
probability density functions 
(PDFs) 189-90 
probability distributions 150-2 
binomial distribution 166-74 
geometric distribution 175-82 
normal distribution 190-206 


qualitative (categorica!) data 2 
data representation 20 
quantitative data 2-3 
quartiles 55 
grouped data 58-9 
vagrouped data 56-7 


random selection 91-2 
range 55-6 
repetitions, permutations 
with 127-8 
restrictions, permutations with 129-3] 


selection, random 91-2 

set notation 95 

sigma (È) notation 30 

skewed data 
box-and-whisker diagrams 60 
measures of central tendency 45-6 
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skewed distributions 190 
standard deviation 65—8 
of the binomial distribution 173—4 
calculation from totals 72 
coded data 75-80 
of combined datasets 72-3 
comparison with interquartile 
range 69 
of a discrete random variable 158 
of anormal distribution 190-1, 194 
standard normal variable (Z) 195-9 
standardising a normal distribution 
200-3 
stem-and-leaf diagrams 3—4 
interquartile range 57 
median 42 


tree diagrams 
for independent events 100-1 
for permutations 125-6 
trials 92-3 


variables 
notation 150 
see also continuous random 
variables; discrete random 
variables 
variance 65-8 
of the binomial distribution 173—4 
calculation from totals 72 
of coded data 75-80 
ot combined datasets 72-3 
of a discrete random variable 157-8 
equivalence of two formulae for 81 
of a normal distribution 193 
variation 55 
see also measures of variation 
Venn diagrams 95-7 


Wiles, Andrew 156 


Z (standard normal variable) 195-9 
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